In [2]:
import numpy as np 
import pandas as pd

## What is DATA?
Data is a collection of facts, information, and statistics that can be in various forms such as numbers, text, sound, images, or any other format. It is the raw material from which information and knowledge are derived. Data can be measured, collected, reported, and analyzed, and it is often visualized using graphs, images, or other analysis tools

## What is Information ?
Information is data that has been processed , organized, or structured in a way that makes it meaningful, valuable and useful. It is data that has been given context , relevance and purpose. It gives knowledge, understanding and insights that can be used for decision-making , problem-solving, communication and various other purposes.

## Categories of Data
Data can be catogeries into two main parts –

* Structured Data: This type of data is organized data into specific format, making it easy to search , analyze and process. Structured data is found in a relational databases that includes information like numbers, data and categories.
* UnStructured Data: Unstructured data does not conform to a specific structure or format. It may include some text documents , images, videos, and other data that is not easily organized or analyzed without additional processing.

## Types of Data
### 1. Quantitative Data (Numerical Data)
Quantitative data represents numerical values and can be measured or counted. It is further classified into two categories:

#### a. Discrete Data
Discrete data consists of distinct, separate values that can be counted as whole numbers. Examples include the number of students in a class, marks of students in a test, and the number of cars in a parking lot. Discrete data is often visualized using bar charts

#### b. Continuous Data
Continuous data represents measurements that can take any value within a given range. Examples include temperature, height, weight, and salary. Continuous data is often visualized using histograms and line charts

#### c. Time-Series Data
Time-series data is collected or recorded over a sequence of equally spaced time intervals. It represents how a particular variable changes over time. Examples include daily stock prices, weather data, and monthly sales figures. Time-series data is visualized using line charts

### 2. Qualitative Data (Categorical Data)
Qualitative data, also known as categorical data, describes qualities, characteristics, or opinions. It is non-numerical and is used to categorize observations into groups. Qualitative data is further divided into two categories:

#### a. Nominal Data
Nominal data consists of categories or names that cannot be ordered or ranked. Examples include gender (male, female), race (White, Black, Asian), and blood type (A, B, AB, O). Nominal data is analyzed using non-parametric tests like Chi-Squared Tests and Fisher’s Exact Tests

#### b. Ordinal Data
Ordinal data consists of categories that can be ordered or ranked, but the distance between categories is not necessarily equal. Examples include education level (Elementary, Middle, High School, College) and job position (Manager, Supervisor, Employee). Ordinal data is analyzed using non-parametric tests like the Wilcoxon Signed-Rank test and Mann-Whitney U test

## Core components of pandas: Series and DataFrames
The primary two components of pandas are the Series and DataFrame.

A Series is essentially a column, and a DataFrame is a multi-dimensional table made up of a collection of Series.

### The Pandas Series Object
A Pandas Series is a one-dimensional array of indexed data. It can be created from a list or array as follows


In [4]:
arr = np.arange(5)
arr

array([0, 1, 2, 3, 4])

In [5]:
series = pd.Series([0.25, 0.5, 0.75, 1.0])
series

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

In [6]:
series[2]

0.75

In [7]:
series[:3]

0    0.25
1    0.50
2    0.75
dtype: float64

In [8]:
series.index

RangeIndex(start=0, stop=4, step=1)

In [10]:
series.values

array([0.25, 0.5 , 0.75, 1.  ])

In [11]:
#This explicit index definition gives the Series object additional capabilities. For example, the index need not be an integer, but can consist of values of any desired type. For example, if we wish, we can use strings as an index:

data = pd.Series([0.25, 0.5, 0.75, 1.0],index=['a', 'b', 'c', 'd'])
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [12]:
# data['a']
data['a':'c']

a    0.25
b    0.50
c    0.75
dtype: float64

In [17]:
# Series as specialized dictionary

population_dict = {'California': 38332521,'Texas': 26448193,'New York': 19651127,'Florida': 19552860,'Illinois': 12882135}
population = pd.Series(population_dict)
population.index

Index(['California', 'Texas', 'New York', 'Florida', 'Illinois'], dtype='object')

In [14]:
population['California':'Florida']

California    38332521
Texas         26448193
New York      19651127
Florida       19552860
dtype: int64

### The Pandas DataFrame Object
If a Series is an analog of a one-dimensional array with flexible indices, a DataFrame is an analog of a two-dimensional array with both flexible row indices and flexible column names

In [20]:
df = pd.DataFrame([[1,2,3], [4,5,6],[7,8,9]])
df

Unnamed: 0,0,1,2
0,1,2,3
1,4,5,6
2,7,8,9


In [21]:
df = pd.DataFrame([[1,2,3], [4,5,6],[7,8,9]], columns=['A', 'B', 'C'], index=['x','y','z'])
df

Unnamed: 0,A,B,C
x,1,2,3
y,4,5,6
z,7,8,9


In [29]:
# Creating a Dataframe from a dictionary
data = {
    "Name": ["John", "John", "Peter", "Linda"],
    "Age": [28, 24, 35, 32],
    "Grade": [55, 34, None, 70],
    "Country": ["USA", "UK", "Australia", "Germany"]
}

df = pd.DataFrame(data, index=[1, 2, 3, 4])
df

Unnamed: 0,Name,Age,Grade,Country
1,John,28,55.0,USA
2,John,24,34.0,UK
3,Peter,35,,Australia
4,Linda,32,70.0,Germany


In [25]:
df.columns

Index(['Name', 'Age', 'Grade', 'Country'], dtype='object')

In [27]:
df.index.to_list()

[1, 2, 3, 4]

In [30]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, 1 to 4
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Name     4 non-null      object 
 1   Age      4 non-null      int64  
 2   Grade    3 non-null      float64
 3   Country  4 non-null      object 
dtypes: float64(1), int64(1), object(2)
memory usage: 160.0+ bytes


In [31]:
df.describe()

Unnamed: 0,Age,Grade
count,4.0,3.0
mean,29.75,53.0
std,4.787136,18.083141
min,24.0,34.0
25%,27.0,44.5
50%,30.0,55.0
75%,32.75,62.5
max,35.0,70.0


In [33]:
df.head(2)

Unnamed: 0,Name,Age,Grade,Country
1,John,28,55.0,USA
2,John,24,34.0,UK


In [34]:
df.tail(2)

Unnamed: 0,Name,Age,Grade,Country
3,Peter,35,,Australia
4,Linda,32,70.0,Germany


In [44]:
df.sample(2, random_state= 2)

Unnamed: 0,Name,Age,Grade,Country
3,Peter,35,,Australia
4,Linda,32,70.0,Germany


In [45]:
df.nunique()

Name       3
Age        4
Grade      3
Country    4
dtype: int64

In [49]:
# df['Grade']
df.Name.unique()

array(['John', 'Peter', 'Linda'], dtype=object)

In [50]:
df.shape

(4, 4)

In [51]:
df.size

16

In [52]:
# Creating a dataframe using multiple pandas series
population

California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
dtype: int64

In [53]:
area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,'Florida': 170312, 'Illinois': 149995}
area = pd.Series(area_dict)
area

California    423967
Texas         695662
New York      141297
Florida       170312
Illinois      149995
dtype: int64

In [62]:
# states = pd.DataFrame({'population': population,'area': area})
states = pd.DataFrame([population, area], index=['population', 'area'])
states.T

# states

Unnamed: 0,population,area
California,38332521,423967
Texas,26448193,695662
New York,19651127,141297
Florida,19552860,170312
Illinois,12882135,149995


In [68]:
series1 = pd.Series([2,4,543,32])
series2 = pd.Series([4,534,212,23])
ind = pd.Index([1,2,3,4])

df = pd.DataFrame({'series1':series1, 'series2':series2})
# df = df.set_index(ind)
df.set_index(ind, inplace=True)

df

Unnamed: 0,series1,series2
1,2,4
2,4,534
3,543,212
4,32,23


### Assignment 1
1. Create a Series object from a list of 10 items using a customized index values. 
2. Create a Series object from a dictionary of 10 items. 
3. Create a DataFrame object from a Series object with a customized index values
4. Create a DataFrame object generated from random values of 5 rows by 5 columns numpy array. Give each columns a title and each index a name.
5. Separate the DataFrame in number 4 to its individual Index, Values and Columns

### Loading in Dataframes from files

In [None]:
coffee = pd.read_csv(r"C:\pandas_work\datasets\coffee.csv")
# coffee.head()
# coffee.nunique()
# coffee['Coffee Type'].unique()
# coffee.describe()

Unnamed: 0,Units Sold
count,14.0
mean,32.857143
std,9.346798
min,15.0
25%,26.25
50%,35.0
75%,38.75
max,45.0


In [75]:
result = pd.read_feather('./datasets/results.feather')
result.head()

Unnamed: 0,year,type,discipline,event,as,athlete_id,noc,team,place,tied,medal
0,1912.0,Summer,Tennis,"Singles, Men (Olympic)",Jean-François Blanchy,1,FRA,,17.0,True,
1,1912.0,Summer,Tennis,"Doubles, Men (Olympic)",Jean-François Blanchy,1,FRA,Jean Montariol,,False,
2,1920.0,Summer,Tennis,"Singles, Men (Olympic)",Jean-François Blanchy,1,FRA,,32.0,True,
3,1920.0,Summer,Tennis,"Doubles, Mixed (Olympic)",Jean-François Blanchy,1,FRA,Jeanne Vaussard,8.0,True,
4,1920.0,Summer,Tennis,"Doubles, Men (Olympic)",Jean-François Blanchy,1,FRA,Jacques Brugnon,4.0,False,


In [76]:
result = pd.read_parquet('./datasets/results.parquet')
result.head()

Unnamed: 0,year,type,discipline,event,as,athlete_id,noc,team,place,tied,medal
0,1912.0,Summer,Tennis,"Singles, Men (Olympic)",Jean-François Blanchy,1,FRA,,17.0,True,
1,1912.0,Summer,Tennis,"Doubles, Men (Olympic)",Jean-François Blanchy,1,FRA,Jean Montariol,,False,
2,1920.0,Summer,Tennis,"Singles, Men (Olympic)",Jean-François Blanchy,1,FRA,,32.0,True,
3,1920.0,Summer,Tennis,"Doubles, Mixed (Olympic)",Jean-François Blanchy,1,FRA,Jeanne Vaussard,8.0,True,
4,1920.0,Summer,Tennis,"Doubles, Men (Olympic)",Jean-François Blanchy,1,FRA,Jacques Brugnon,4.0,False,


In [77]:
olympic = pd.read_excel('./datasets/olympics-data.xlsx')
olympic

KeyboardInterrupt: 

### Saving dataframe to file

In [None]:
result.to_csv('result.csv')
# df = pd.read_csv('result.csv')
# df    

### The Pandas Index Object

In [79]:
ind = pd.Index([2, 3, 5, 7, 11])
ind

Index([2, 3, 5, 7, 11], dtype='int64')

In [80]:
ind[1]

3

In [None]:
ind[::2] #step 2

Index([2, 5, 11], dtype='int64')

In [83]:
print(ind.size, ind.shape, ind.ndim, ind.dtype)

5 (5,) 1 int64


In [84]:
ind[1] = 0  # error becuase pandas index is immutable

TypeError: Index does not support mutable operations

In [85]:
indA = pd.Index([1, 3, 5, 7, 9], dtype= 'int64')
indB = pd.Index([2, 3, 5, 7, 11], dtype= 'int64')

In [86]:
indA.intersection(indB)

Index([3, 5, 7], dtype='int64')

In [87]:

indA.union(indB)

Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')

In [88]:
indA.symmetric_difference(indB)

Index([1, 2, 9, 11], dtype='int64')

## Data Indexing and Selection

### Data selection in series

In [89]:
data = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd'])
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [90]:
data['c']

0.75

In [91]:
'a' in data

True

In [97]:
# data.keys()
list(data.items())

[('a', 0.25), ('b', 0.5), ('c', 0.75), ('d', 1.0)]

In [98]:
data[1:3]

b    0.50
c    0.75
dtype: float64

In [99]:
# slicing by explicit index
data['a':'c']

a    0.25
b    0.50
c    0.75
dtype: float64

In [100]:
# slicing by implicit integer index
data[0:2]

a    0.25
b    0.50
dtype: float64

In [103]:
# masking
data[(data > 0.3) & (data < 0.8)]

b    0.50
c    0.75
dtype: float64

In [104]:
# fancy indexing
data[['a', 'd']]

a    0.25
d    1.00
dtype: float64

### Indexers: loc, iloc
These slicing and indexing conventions can be a source of confusion. For example, if your Series has an explicit integer index, an indexing operation such as data[1] will use the explicit indices, while a slicing operation like data[1:3] will use the implicit Python-style index.

* loc - locates by name
* iloc- locates by numerical index

In [105]:
data = pd.Series(['a', 'b', 'c'], index=[1, 3, 5])
data

1    a
3    b
5    c
dtype: object

In [None]:
# explicit index when indexing
data[1]

'a'

In [122]:
# implicit index when slicing
data[1:3]

3    b
5    c
dtype: object


Because of this potential confusion in the case of integer indexes, Pandas provides some special indexer attributes that explicitly expose certain indexing schemes. These are not functional methods, but attributes that expose a particular slicing interface to the data in the Series.

First, the loc attribute allows indexing and slicing that always references the explicit index:

In [109]:
data.loc[1]

'a'

In [110]:
data.loc[1:3]

1    a
3    b
dtype: object


The iloc attribute allows indexing and slicing that always references the implicit Python-style index:

In [111]:
data.iloc[0]

'a'

In [112]:
data.iloc[1:3]

3    b
5    c
dtype: object

### Data Selection in DataFrame

In [122]:
states = states.T.copy()

In [123]:
states

Unnamed: 0,population,area
California,38332521,423967
Texas,26448193,695662
New York,19651127,141297
Florida,19552860,170312
Illinois,12882135,149995


In [125]:
states['area'] is states.area

True

In [126]:
# states = states.rename(columns={'population':'pop', 'area':'Area'})
states.rename(columns={'population':'pop'}, inplace=True)
states

Unnamed: 0,pop,area
California,38332521,423967
Texas,26448193,695662
New York,19651127,141297
Florida,19552860,170312
Illinois,12882135,149995


In [127]:
states['pop'] is states.pop

False

In [128]:
states['density'] = states['pop']/states['area']
# states['avg population'] = np.nan
states

Unnamed: 0,pop,area,density
California,38332521,423967,90.413926
Texas,26448193,695662,38.01874
New York,19651127,141297,139.076746
Florida,19552860,170312,114.806121
Illinois,12882135,149995,85.883763


In [130]:
states.columns

Index(['pop', 'area', 'density'], dtype='object')

In [131]:
states.values

array([[3.83325210e+07, 4.23967000e+05, 9.04139261e+01],
       [2.64481930e+07, 6.95662000e+05, 3.80187404e+01],
       [1.96511270e+07, 1.41297000e+05, 1.39076746e+02],
       [1.95528600e+07, 1.70312000e+05, 1.14806121e+02],
       [1.28821350e+07, 1.49995000e+05, 8.58837628e+01]])

In [None]:
# states.loc['California']
# states.loc['California', ['area', 'density']]
# states.loc[:, 'area']
states.loc['California': 'Florida', ['area', 'pop']]


Unnamed: 0,area,pop
California,423967,38332521
Texas,695662,26448193
New York,141297,19651127
Florida,170312,19552860


In [137]:
states

Unnamed: 0,pop,area,density
California,38332521,423967,90.413926
Texas,26448193,695662,38.01874
New York,19651127,141297,139.076746
Florida,19552860,170312,114.806121
Illinois,12882135,149995,85.883763


In [None]:
# states.iloc[0]
# states.iloc[0, 0]
# states.iloc[:, 0]
states.iloc[0:3, [0, 1]]
# states.iloc[1:4, [1, 2]]

# states.iloc[::2, [0, 1]]

Unnamed: 0,pop,area
California,38332521,423967
New York,19651127,141297
Illinois,12882135,149995


### Filtering Data

In [145]:

# states.loc[states['density'] > 100] 
states.loc[states['density'] > 100, ['area', 'pop']] 

Unnamed: 0,area,pop
New York,141297,19651127
Florida,170312,19552860


In [150]:
states[states['density'] > 100][['area', 'density']]

Unnamed: 0,area,density
New York,141297,139.076746
Florida,170312,114.806121


In [None]:
bios = pd.read_csv('https://raw.githubusercontent.com/DamilareArise/pandas_work/refs/heads/main/datasets/bios.csv')
bios.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145500 entries, 0 to 145499
Data columns (total 10 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   athlete_id    145500 non-null  int64  
 1   name          145500 non-null  object 
 2   born_date     143693 non-null  object 
 3   born_city     110908 non-null  object 
 4   born_region   110908 non-null  object 
 5   born_country  110908 non-null  object 
 6   NOC           145499 non-null  object 
 7   height_cm     106651 non-null  float64
 8   weight_kg     102070 non-null  float64
 9   died_date     33940 non-null   object 
dtypes: float64(2), int64(1), object(7)
memory usage: 11.1+ MB


In [153]:
bios.head()

Unnamed: 0,athlete_id,name,born_date,born_city,born_region,born_country,NOC,height_cm,weight_kg,died_date
0,1,Jean-François Blanchy,1886-12-12,Bordeaux,Gironde,FRA,France,,,1960-10-02
1,2,Arnaud Boetsch,1969-04-01,Meulan,Yvelines,FRA,France,183.0,76.0,
2,3,Jean Borotra,1898-08-13,Biarritz,Pyrénées-Atlantiques,FRA,France,183.0,76.0,1994-07-17
3,4,Jacques Brugnon,1895-05-11,Paris VIIIe,Paris,FRA,France,168.0,64.0,1978-03-20
4,5,Albert Canet,1878-04-17,Wandsworth,England,GBR,France,,,1930-07-25


In [156]:
# bios[bios['height_cm'] > 215][['name', 'height_cm']]
# or 
# bios.loc[bios['height_cm'] > 215, ['name', 'height_cm']]

bios[bios['height_cm'] > 215][['name', 'height_cm']].sort_values('height_cm', ascending=False)


Unnamed: 0,name,height_cm
89070,Yao Ming,226.0
5781,Tommy Burleson,223.0
6978,Arvydas Sabonis,223.0
5673,Gunther Behnke,221.0
120266,Zhang Zhaoxu,221.0
89075,Roberto Dueñas,221.0
5089,Viktor Pankrashkin,220.0
7188,Vladimir Tkachenko,220.0
6504,Luc Longley,220.0
118676,Dmitry Musersky,219.0


In [157]:
bios[(bios['height_cm'] > 215) & (bios['born_country'] == 'USA') ][['name', 'height_cm', 'born_country']]

Unnamed: 0,name,height_cm,born_country
5781,Tommy Burleson,223.0,USA
6722,Shaquille O'Neal,216.0,USA
6937,David Robinson,216.0,USA
123850,Tyson Chandler,216.0,USA


In [166]:
bios['First_name'] = bios['name'].str.split(' ').str[0]

In [168]:
bios.head()

Unnamed: 0,athlete_id,name,born_date,born_city,born_region,born_country,NOC,height_cm,weight_kg,died_date,First_name
0,1,Jean-François Blanchy,1886-12-12,Bordeaux,Gironde,FRA,France,,,1960-10-02,Jean-François
1,2,Arnaud Boetsch,1969-04-01,Meulan,Yvelines,FRA,France,183.0,76.0,,Arnaud
2,3,Jean Borotra,1898-08-13,Biarritz,Pyrénées-Atlantiques,FRA,France,183.0,76.0,1994-07-17,Jean
3,4,Jacques Brugnon,1895-05-11,Paris VIIIe,Paris,FRA,France,168.0,64.0,1978-03-20,Jacques
4,5,Albert Canet,1878-04-17,Wandsworth,England,GBR,France,,,1930-07-25,Albert


In [175]:
#using the str method

# bios[bios['name'].str.contains('David')]
# bios[bios['name'].str.contains('^David')] # using the ^ symbol to match the start of the string
# bios[bios['name'].str.contains('David$')] # using the $ symbol to match the end of the string


# bios[bios['name'].str.contains('david', case= False)]
# bios[bios['name'].str.contains('david|keith', case= False)] 
bios[(bios['name'].str.contains('david|keith', case= False)) & (bios['born_country'] == 'USA')] 

dob_199x = bios[bios['born_date'].str.contains('^199', case=False, na=False)]
dob_199x


Unnamed: 0,athlete_id,name,born_date,born_city,born_region,born_country,NOC,height_cm,weight_kg,died_date,First_name
102295,103238,Zhang Tianyi,1990-04-24,Dandong,Liaoning,CHN,People's Republic of China,172.0,58.0,,Zhang
102364,103308,Anastasiya Prilepa,1990-03-15,Almaty,Almaty,KAZ,Kazakhstan,161.0,47.0,,Anastasiya
102391,103338,Yvonne Yip,1990-10-22,Hong Kong,Hong Kong,HKG,"Hong Kong, China",157.0,47.0,,Yvonne
103143,104113,Sameera Al-Bitar,1990-02-21,Amman,Amman,JOR,Bahrain,168.0,64.0,,Sameera
103151,104121,Tojohanitra Andriamanjatoarimanana,1990-10-31,,,,Madagascar,140.0,,,Tojohanitra
...,...,...,...,...,...,...,...,...,...,...,...
145487,149214,Yekaterina Kosova,1996-04-25,Moskva (Moscow),Moskva,RUS,ROC,,,,Yekaterina
145490,149217,Sin Ye-Chan,1995-06-13,,,,Republic of Korea,,,,Sin
145493,149220,Landysh Falyakhova,1998-08-31,Dva Polya Artash,Respublika Tatarstan,RUS,ROC,,,,Landysh
145496,149223,Valeriya Merkusheva,1999-09-20,Moskva (Moscow),Moskva,RUS,ROC,168.0,65.0,,Valeriya


In [186]:
bios[(bios['born_country'].isin(['USA', "FRA", "GBR"])) & (bios['name'].str.startswith("Keith"))]

Unnamed: 0,athlete_id,name,born_date,born_city,born_region,born_country,NOC,height_cm,weight_kg,died_date
3505,3517,Keith Wallace,1961-03-29,Preston,England,GBR,Great Britain,165.0,51.0,1999-12-31
12053,12118,Keith Hervey,1898-11-03,Fulham,England,GBR,Great Britain,,,1973-02-22
14577,14674,Keith Harrison,1933-03-28,Birmingham,England,GBR,Great Britain,,,
16166,16281,Keith Reynolds,1963-12-25,Solihull,England,GBR,Great Britain,173.0,68.0,
18734,18862,Keith Sinclair,1945-06-26,Sunderland,England,GBR,Great Britain,190.0,79.0,
29897,30123,Keith Langley,1961-06-03,Aldershot,England,GBR,Great Britain,173.0,70.0,
34011,34275,Keith Remfry,1947-11-17,Ealing,England,GBR,Great Britain,193.0,114.0,2015-09-16
46885,47234,Keith Collin,1937-01-18,Marylebone,England,GBR,Great Britain,168.0,63.0,1991-03-06
50929,51288,Keith Carter,1924-08-30,Akron,Ohio,USA,United States,,,2013-05-03
51185,51544,Keith Russell,1948-01-15,Mesa,Arizona,USA,United States,188.0,73.0,


In [None]:
# First 2 rows and first two column
# bios.iloc[0:2, [0,2]]



Unnamed: 0,athlete_id,name,born_date,born_city,born_region,born_country,NOC,height_cm,weight_kg,died_date
0,1,Jean-François Blanchy,1886-12-12,Bordeaux,Gironde,FRA,France,,,1960-10-02
1,2,Arnaud Boetsch,1969-04-01,Meulan,Yvelines,FRA,France,183.0,76.0,
2,3,Jean Borotra,1898-08-13,Biarritz,Pyrénées-Atlantiques,FRA,France,183.0,76.0,1994-07-17
3,4,Jacques Brugnon,1895-05-11,Paris VIIIe,Paris,FRA,France,168.0,64.0,1978-03-20
4,5,Albert Canet,1878-04-17,Wandsworth,England,GBR,France,,,1930-07-25
...,...,...,...,...,...,...,...,...,...,...
145495,149222,Polina Luchnikova,2002-01-30,Serov,Sverdlovsk,RUS,ROC,167.0,61.0,
145496,149223,Valeriya Merkusheva,1999-09-20,Moskva (Moscow),Moskva,RUS,ROC,168.0,65.0,
145497,149224,Yuliya Smirnova,1998-05-08,Kotlas,Arkhangelsk,RUS,ROC,163.0,55.0,
145498,149225,André Foussard,1899-05-19,Niort,Deux-Sèvres,FRA,France,166.0,,1986-03-18


### Assignment 2
Using cofee.csv  
1. Display the first 5 rows of the coffee sales DataFrame.  
2. Select only the "Coffee Type" column from the DataFrame.  
3. Select the "Units Sold" values for Wednesday using label-based indexing.  
4. Select all rows where the day is "Monday".  
5. Select all records where more than 30 units were sold.  
6. Select all rows where the coffee type is "Latte" and units sold are greater than 25.  
7. Filter the DataFrame to show only weekend sales (Saturday and Sunday).  
8. Create a filtered view showing only Espresso sales with units sold between 30 and 40 (inclusive).  
9. Filter for days where Latte sales were exactly 35 units.  
10. Sort the DataFrame by units sold in descending order.  