## Pandas

Pandas is an open source, high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
Pandas adds data structures and tools designed to work with table-like data which is *Series* and *Data Frames*.
Pandas provides tools for data manipulation: 

- cleaning
- exploring
- analysing
- reshaping
- merging
- sorting
- slicing
- aggregation
- imputation

If you are using anaconda, you do not have install pandas.

### Installing Pandas

For Mac:
```py
pip install conda
conda install pandas
```

For Windows:
```py
pip install conda
pip install pandas
```

Pandas data structure is based on *Series* and *DataFrames*. 

A *series* is a *column* and a DataFrame is a *multidimensional table* made up of collection of *series*. In order to create a pandas series we should use numpy to create a one dimensional arrays or a python list.
Let us see an example of a series:

Names Pandas Series

Names Pandas Series

![pandas series](./images/pandas-series-1.png) 

Countries Series

![pandas series](./images/pandas-series-2.png) 

Cities Series

![pandas series](./images/pandas-series-3.png)

As you can see, pandas series is just one column of data. If we want to have multiple columns we use data frames. The example below shows pandas DataFrames.

Let us see, an example of a pandas data frame:

![Pandas data frame](./images/pandas-dataframe-1.png)

Data frame is a collection of rows and columns. Look at the table below; it has many more columns than the example above:

![Pandas data frame](./images/pandas-dataframe-2.png)

Next, we will see how to import pandas and how to create Series and DataFrames using pandas


### Importing Pandas

```python
import pandas as pd # importing pandas as pd
import numpy  as np # importing numpy as np
```

In [400]:
import pandas as pd # importing pandas as pd
import numpy  as np # importing numpy as np

In [401]:
### Pandas Version

In [402]:
pd.__version__

'1.1.3'

In [415]:
nums = [1, 2, 3, 4,5]
s = pd.Series(nums)
print(s)

0    1
1    2
2    3
3    4
4    5
dtype: int64


## Getting the index from the Pandas Series

In [422]:
s.index

RangeIndex(start=0, stop=5, step=1)

In [423]:
list(s.index)

[0, 1, 2, 3, 4]

### Creating  Pandas Series with custom index

In [425]:
nums = [1, 2, 3, 4, 5]
s = pd.Series(nums, index=[1, 2, 3, 4, 5])
print(s)

1    1
2    2
3    3
4    4
5    5
dtype: int64


In [426]:
s.index

Int64Index([1, 2, 3, 4, 5], dtype='int64')

In [428]:
nums = [1, 2, 3, 4, 5]
s = pd.Series(nums, index=['A', 'B', 'C', 'D', 'E'])
print(s)

A    1
B    2
C    3
D    4
E    5
dtype: int64


In [429]:
s.index

Index(['A', 'B', 'C', 'D', 'E'], dtype='object')

In [430]:
fruits = ['Orange','Banana','Mango']
fruits = pd.Series(fruits, index=[1, 2, 3])
print(fruits)

1    Orange
2    Banana
3     Mango
dtype: object


In [431]:
fruits = ['Orange','Banana','Mango']
fruits = pd.Series(fruits, index=['O', 'B', 'M'])
print(fruits)

O    Orange
B    Banana
M     Mango
dtype: object


### Creating Pandas Series from a Dictionary

In [432]:
dct = {'name':'Asabeneh','country':'Finland','city':'Helsinki'}
s = pd.Series(dct)
print(s)

name       Asabeneh
country     Finland
city       Helsinki
dtype: object


### Creating a Constant Pandas Series

In [435]:
s = pd.Series(10, index = [1, 2, 3])
print(s)

1    10
2    10
3    10
dtype: int64


### Creating a  Pandas Series Using Linspace

In [436]:
s = pd.Series(np.linspace(5, 20, 10)) # linspace(starting, end, items)
print(s)

0     5.000000
1     6.666667
2     8.333333
3    10.000000
4    11.666667
5    13.333333
6    15.000000
7    16.666667
8    18.333333
9    20.000000
dtype: float64



## DataFrames

Pandas data frame has both rows and columns. It can be created in different ways. 

### Creating DataFrames from List of Lists

In [438]:
data = [
    ['Asabeneh', 'Finland', 'Helsink'], 
    ['David', 'UK', 'London'],
    ['John', 'Sweden', 'Stockholm']
]
df = pd.DataFrame(data, columns=['Names','Country','City'])
df

Unnamed: 0,Names,Country,City
0,Asabeneh,Finland,Helsink
1,David,UK,London
2,John,Sweden,Stockholm


### Creating DataFrame Using Dictionary

In [439]:
data = {'Name': ['Asabeneh', 'David', 'John'], 'Country':[
    'Finland', 'UK', 'Sweden'], 'City': ['Helsiki', 'London', 'Stockholm']}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Country,City
0,Asabeneh,Finland,Helsiki
1,David,UK,London
2,John,Sweden,Stockholm


### Creating DataFrames from a List of Dictionaries

In [440]:
data = [
    {'Name': 'Asabeneh', 'Country': 'Finland', 'City': 'Helsinki'},
    {'Name': 'David', 'Country': 'UK', 'City': 'London'},
    {'Name': 'John', 'Country': 'Sweden', 'City': 'Stockholm'}]
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Country,City
0,Asabeneh,Finland,Helsinki
1,David,UK,London
2,John,Sweden,Stockholm


## Reading different file formats Using Pandas

### Reading CSV File Using Pandas

#### Loading a CSV file

In [441]:
import pandas as pd

df = pd.read_csv('./datasets/weight-height.csv')
df

Unnamed: 0,Gender,Height,Weight
0,Male,73.847017,241.893563
1,Male,68.781904,162.310473
2,Male,74.110105,212.740856
3,Male,71.730978,220.042470
4,Male,69.881796,206.349801
...,...,...,...
9995,Female,66.172652,136.777454
9996,Female,67.067155,170.867906
9997,Female,63.867992,128.475319
9998,Female,69.034243,163.852461


#### Data Exploration

Data exploration is an initial stage of data analysis used to explore and visualize data to get insights from the beginning of data analysis or identifing some patterns for further analysis.

#### Reading the first few records of a dataset using head()
The head() method gives 5 records by default, however, an agrument can be passed to the head() method. 

In [442]:
df.head()

Unnamed: 0,Gender,Height,Weight
0,Male,73.847017,241.893563
1,Male,68.781904,162.310473
2,Male,74.110105,212.740856
3,Male,71.730978,220.04247
4,Male,69.881796,206.349801


The head() method with argument provides as large as the size of the argument. If we pass 10 in the head() as argument will get 10 records.

In [443]:
df.head(10)

Unnamed: 0,Gender,Height,Weight
0,Male,73.847017,241.893563
1,Male,68.781904,162.310473
2,Male,74.110105,212.740856
3,Male,71.730978,220.04247
4,Male,69.881796,206.349801
5,Male,67.253016,152.212156
6,Male,68.785081,183.927889
7,Male,68.348516,167.97111
8,Male,67.01895,175.92944
9,Male,63.456494,156.399676


### Reading the last records of a dataset

To explore the last five records of the data set we use the tail() method. However, we can get fewer or larger records by changing the argument we pass to the tail() method.




In [444]:
df.tail()

Unnamed: 0,Gender,Height,Weight
9995,Female,66.172652,136.777454
9996,Female,67.067155,170.867906
9997,Female,63.867992,128.475319
9998,Female,69.034243,163.852461
9999,Female,61.944246,113.649103


In [445]:
df.tail(10) # tail() method with an argument

Unnamed: 0,Gender,Height,Weight
9990,Female,63.179498,141.2661
9991,Female,62.636675,102.853563
9992,Female,62.077832,138.69168
9993,Female,60.030434,97.687432
9994,Female,59.09825,110.529686
9995,Female,66.172652,136.777454
9996,Female,67.067155,170.867906
9997,Female,63.867992,128.475319
9998,Female,69.034243,163.852461
9999,Female,61.944246,113.649103


### Number of Columns

Knowing the fields or attributes of the dataset is one part of data exploration. In this dataset there are only three columns but most of the time, the size of columns is larger than this. Therefore, it is good to know how to get the columns and the size of the columns.
We will use the .columns DataFrame attribute to get a column list.

In [446]:
df.columns

Index(['Gender', 'Height', 'Weight'], dtype='object')

### DataFrame shape

The DataFrame shape allows to understand the dataset better. It tells the number of rows and columns

In [447]:
df.shape

(10000, 3)

#### Descriptive Statistics

Descriptive statistics summarizes a given data set that can be either a representation of the entire or a sample of a population. Descriptive statistics are divided into measures of central tendency and measures of variability (spread).

Measures of central tendency includes:

- mean
- median
- mode

Measures of variability include:

- standard deviation
- variance
- minimum
- maximum
- kurtosis
- skewness

Pandas describe() provides a descriptive statistics of a dataset. The method takes a couple of arguments
```py
DataFrame.describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)
```



In [448]:
df.describe() # without any argument

Unnamed: 0,Height,Weight
count,10000.0,10000.0
mean,66.36756,161.440357
std,3.847528,32.108439
min,54.263133,64.700127
25%,63.50562,135.818051
50%,66.31807,161.212928
75%,69.174262,187.169525
max,78.998742,269.989699


In [449]:
df.describe(include='all', percentiles=[0.25, 0.5, 0.75, 0.85])

Unnamed: 0,Gender,Height,Weight
count,10000,10000.0,10000.0
unique,2,,
top,Male,,
freq,5000,,
mean,,66.36756,161.440357
std,,3.847528,32.108439
min,,54.263133,64.700127
25%,,63.50562,135.818051
50%,,66.31807,161.212928
75%,,69.174262,187.169525


### Get information about the dataset

It also possible to get some information about the dataset using the method info(). The info() takes a couple of arguments.

```py
DataFrame.info(verbose=None, buf=None, max_cols=None, memory_usage=None, show_counts=None, null_counts=None
```

In [450]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Gender  10000 non-null  object 
 1   Height  10000 non-null  float64
 2   Weight  10000 non-null  float64
dtypes: float64(2), object(1)
memory usage: 234.5+ KB


In [451]:
df.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Gender  10000 non-null  object 
 1   Height  10000 non-null  float64
 2   Weight  10000 non-null  float64
dtypes: float64(2), object(1)
memory usage: 234.5+ KB


## Modifying a DataFrame

Modifying a DataFrame:

    - We can create a new DataFrame
    - We can create a new column and add it to the DataFrame, 
    - we can remove an existing column from a DataFrame, 
    - we can modify an existing column in a DataFrame, 
    - we can change the data type of column values in the DataFrame

### Creating a DataFrame

As we have seen before, it is possible to create DataFrame from list of lists, list of dictionaries or dictionaries.

As always, first we import the necessary packages. Now, lets import pandas and numpy, two best friends ever.

In [452]:

import pandas as pd
import numpy as np
data = [
    {"Name": "Asabeneh", "Country":"Finland","City":"Helsinki"},
    {"Name": "David", "Country":"UK","City":"London"},
    {"Name": "John", "Country":"Sweden","City":"Stockholm"},
    {"Name": "Eyob", "Country":"Finland","City":"Espoo"},
    {"Name": "Pawel", "Country":"Poland","City":"Warsaw"},
    {"Name": "Lidiya", "Country":float('NaN'),"City":float('NaN')},
]
df = pd.DataFrame(data)
print(df)

       Name  Country       City
0  Asabeneh  Finland   Helsinki
1     David       UK     London
2      John   Sweden  Stockholm
3      Eyob  Finland      Espoo
4     Pawel   Poland     Warsaw
5    Lidiya      NaN        NaN


### Adding a New Column
Let's add a weight column in the DataFrame

In [453]:
weights = [74, 78, 69, 71, 102, float('NaN')]
df['Weight'] = weights
df

Unnamed: 0,Name,Country,City,Weight
0,Asabeneh,Finland,Helsinki,74.0
1,David,UK,London,78.0
2,John,Sweden,Stockholm,69.0
3,Eyob,Finland,Espoo,71.0
4,Pawel,Poland,Warsaw,102.0
5,Lidiya,,,


Let's add a height column into the DataFrame as well

In [454]:
heights = [173, 175, 169, 173, 195, float('NaN')]
df['Height'] = heights
print(df)

       Name  Country       City  Weight  Height
0  Asabeneh  Finland   Helsinki    74.0   173.0
1     David       UK     London    78.0   175.0
2      John   Sweden  Stockholm    69.0   169.0
3      Eyob  Finland      Espoo    71.0   173.0
4     Pawel   Poland     Warsaw   102.0   195.0
5    Lidiya      NaN        NaN     NaN     NaN


As you can see in the DataFrame above, we did add new columns, Weight and Height. Let's add one additional column called BMI(Body Mass Index) by calculating their BMI using thier mass and height. BMI is mass divided by height squared (in meters) - Weight/Height * Height.

As you can see, the height is in centimeters, so we shoud change it to meters. Let's modify the height row.

### Modifying column values

In [455]:
df['Height'] = df['Height'] * 0.01
df

Unnamed: 0,Name,Country,City,Weight,Height
0,Asabeneh,Finland,Helsinki,74.0,1.73
1,David,UK,London,78.0,1.75
2,John,Sweden,Stockholm,69.0,1.69
3,Eyob,Finland,Espoo,71.0,1.73
4,Pawel,Poland,Warsaw,102.0,1.95
5,Lidiya,,,,


In [456]:
# Using functions makes our code clean, but you can calculate the bmi without a function
def calculate_bmi ():
    weights = df['Weight']
    heights = df['Height']
    bmi = []
    for w,h in zip(weights, heights):
        b = w/(h*h)
        bmi.append(b)
    return bmi
    
bmi = calculate_bmi()

In [457]:
df['BMI'] = bmi
df

Unnamed: 0,Name,Country,City,Weight,Height,BMI
0,Asabeneh,Finland,Helsinki,74.0,1.73,24.725183
1,David,UK,London,78.0,1.75,25.469388
2,John,Sweden,Stockholm,69.0,1.69,24.158818
3,Eyob,Finland,Espoo,71.0,1.73,23.722811
4,Pawel,Poland,Warsaw,102.0,1.95,26.824458
5,Lidiya,,,,,


### Formating DataFrame columns
The BMI column values of the DataFrame are float with many significant digits after decimal. Let's change it to one significant digit after point.

In [458]:
df['BMI'] = round(df['BMI'], 1)
df

Unnamed: 0,Name,Country,City,Weight,Height,BMI
0,Asabeneh,Finland,Helsinki,74.0,1.73,24.7
1,David,UK,London,78.0,1.75,25.5
2,John,Sweden,Stockholm,69.0,1.69,24.2
3,Eyob,Finland,Espoo,71.0,1.73,23.7
4,Pawel,Poland,Warsaw,102.0,1.95,26.8
5,Lidiya,,,,,



The information in the DataFrame seems not yet complete, let's add birth year and current year columns.

In [459]:
birth_year = ['1769', '1985', '1990', '1983', '1985',float('NaN')]
df['Birth Year'] = birth_year
df['Current Year'] = current_year
df

Unnamed: 0,Name,Country,City,Weight,Height,BMI,Birth Year,Current Year
0,Asabeneh,Finland,Helsinki,74.0,1.73,24.7,1769.0,2020.0
1,David,UK,London,78.0,1.75,25.5,1985.0,2020.0
2,John,Sweden,Stockholm,69.0,1.69,24.2,1990.0,2020.0
3,Eyob,Finland,Espoo,71.0,1.73,23.7,1983.0,2020.0
4,Pawel,Poland,Warsaw,102.0,1.95,26.8,1985.0,2020.0
5,Lidiya,,,,,,,


### Deleting a DataFrame Column
#### Deleting Columns

To delete a DataFrame column(s), we use the name of the columns and the axis as 1. 

In [460]:
# Let us imagine the Name column is not important.
# This does not affect the original data frame. To change from the original data frame we should add the inpplace argument
df.drop('Name', axis=1)

Unnamed: 0,Country,City,Weight,Height,BMI,Birth Year,Current Year
0,Finland,Helsinki,74.0,1.73,24.7,1769.0,2020.0
1,UK,London,78.0,1.75,25.5,1985.0,2020.0
2,Sweden,Stockholm,69.0,1.69,24.2,1990.0,2020.0
3,Finland,Espoo,71.0,1.73,23.7,1983.0,2020.0
4,Poland,Warsaw,102.0,1.95,26.8,1985.0,2020.0
5,,,,,,,


In [461]:
#The original dataframe has not beeen changes
df

Unnamed: 0,Name,Country,City,Weight,Height,BMI,Birth Year,Current Year
0,Asabeneh,Finland,Helsinki,74.0,1.73,24.7,1769.0,2020.0
1,David,UK,London,78.0,1.75,25.5,1985.0,2020.0
2,John,Sweden,Stockholm,69.0,1.69,24.2,1990.0,2020.0
3,Eyob,Finland,Espoo,71.0,1.73,23.7,1983.0,2020.0
4,Pawel,Poland,Warsaw,102.0,1.95,26.8,1985.0,2020.0
5,Lidiya,,,,,,,


In [462]:
# Let us imagine the Name column is not important.
# This does not affect the original data frame. To change from the original data frame we should add the inpplace argument
df.drop('Name', axis=1, inplace=True)
df

Unnamed: 0,Country,City,Weight,Height,BMI,Birth Year,Current Year
0,Finland,Helsinki,74.0,1.73,24.7,1769.0,2020.0
1,UK,London,78.0,1.75,25.5,1985.0,2020.0
2,Sweden,Stockholm,69.0,1.69,24.2,1990.0,2020.0
3,Finland,Espoo,71.0,1.73,23.7,1983.0,2020.0
4,Poland,Warsaw,102.0,1.95,26.8,1985.0,2020.0
5,,,,,,,


In [463]:
# We can also use the columns attribute to delete a column. Let us remove the country column
# To change the original we should make the inplace argument True
df.drop(columns = 'Country', axis=1)

Unnamed: 0,City,Weight,Height,BMI,Birth Year,Current Year
0,Helsinki,74.0,1.73,24.7,1769.0,2020.0
1,London,78.0,1.75,25.5,1985.0,2020.0
2,Stockholm,69.0,1.69,24.2,1990.0,2020.0
3,Espoo,71.0,1.73,23.7,1983.0,2020.0
4,Warsaw,102.0,1.95,26.8,1985.0,2020.0
5,,,,,,


In [464]:
#### Removing multiple columns
df.drop(['Country','City'], axis=1)

Unnamed: 0,Weight,Height,BMI,Birth Year,Current Year
0,74.0,1.73,24.7,1769.0,2020.0
1,78.0,1.75,25.5,1985.0,2020.0
2,69.0,1.69,24.2,1990.0,2020.0
3,71.0,1.73,23.7,1983.0,2020.0
4,102.0,1.95,26.8,1985.0,2020.0
5,,,,,


### Deleting Rows

The fifth row does not have full information and it is not important to keep in the dataset. Let's remove the fifth row.

In [465]:
# To delete it from the original dataframe the inplace=True should be included
df.drop([5,5], axis=0)

Unnamed: 0,Country,City,Weight,Height,BMI,Birth Year,Current Year
0,Finland,Helsinki,74.0,1.73,24.7,1769,2020.0
1,UK,London,78.0,1.75,25.5,1985,2020.0
2,Sweden,Stockholm,69.0,1.69,24.2,1990,2020.0
3,Finland,Espoo,71.0,1.73,23.7,1983,2020.0
4,Poland,Warsaw,102.0,1.95,26.8,1985,2020.0


In [466]:
df

Unnamed: 0,Country,City,Weight,Height,BMI,Birth Year,Current Year
0,Finland,Helsinki,74.0,1.73,24.7,1769.0,2020.0
1,UK,London,78.0,1.75,25.5,1985.0,2020.0
2,Sweden,Stockholm,69.0,1.69,24.2,1990.0,2020.0
3,Finland,Espoo,71.0,1.73,23.7,1983.0,2020.0
4,Poland,Warsaw,102.0,1.95,26.8,1985.0,2020.0
5,,,,,,,


### Renaming Columns

In [467]:
# To modifiy original dataframe the inplace=True should be included
df.rename(
    columns={
        "Country": "country",
        "City": "city",
        "Weight":"weight",
        "Height":"height",
        "BMI":'bmi',
        'Birth Year':'birth_year',
        'Current Year':'current_year'
    }
)

Unnamed: 0,country,city,weight,height,bmi,birth_year,current_year
0,Finland,Helsinki,74.0,1.73,24.7,1769.0,2020.0
1,UK,London,78.0,1.75,25.5,1985.0,2020.0
2,Sweden,Stockholm,69.0,1.69,24.2,1990.0,2020.0
3,Finland,Espoo,71.0,1.73,23.7,1983.0,2020.0
4,Poland,Warsaw,102.0,1.95,26.8,1985.0,2020.0
5,,,,,,,


In [468]:
df.rename(columns=str.lower)

Unnamed: 0,country,city,weight,height,bmi,birth year,current year
0,Finland,Helsinki,74.0,1.73,24.7,1769.0,2020.0
1,UK,London,78.0,1.75,25.5,1985.0,2020.0
2,Sweden,Stockholm,69.0,1.69,24.2,1990.0,2020.0
3,Finland,Espoo,71.0,1.73,23.7,1983.0,2020.0
4,Poland,Warsaw,102.0,1.95,26.8,1985.0,2020.0
5,,,,,,,


In [469]:
df

Unnamed: 0,Country,City,Weight,Height,BMI,Birth Year,Current Year
0,Finland,Helsinki,74.0,1.73,24.7,1769.0,2020.0
1,UK,London,78.0,1.75,25.5,1985.0,2020.0
2,Sweden,Stockholm,69.0,1.69,24.2,1990.0,2020.0
3,Finland,Espoo,71.0,1.73,23.7,1983.0,2020.0
4,Poland,Warsaw,102.0,1.95,26.8,1985.0,2020.0
5,,,,,,,


## Checking data types of Column values

In [470]:
df.Weight.dtype

dtype('float64')

In [471]:
df['Birth Year'].dtype # it gives string object , we should change this to 

dtype('O')

In [472]:
# dropping the NaN type first
df['Birth Year'] = df.drop([5, 5], axis = 0)['Birth Year'].astype('int')
print(df['Birth Year'].dtype) # let's check the data type now

float64


Now same for the current year:

In [473]:
df['Current Year'] = df.drop([5, 5], axis = 0)['Current Year'].astype('int')
df['Current Year'].dtype

dtype('float64')

Now, the column values of birth year and current year are integers. We can calculate the age.

In [474]:
ages = df['Current Year'] - df['Birth Year']
ages

0    251.0
1     35.0
2     30.0
3     37.0
4     35.0
5      NaN
dtype: float64

In [475]:
df

Unnamed: 0,Country,City,Weight,Height,BMI,Birth Year,Current Year
0,Finland,Helsinki,74.0,1.73,24.7,1769.0,2020.0
1,UK,London,78.0,1.75,25.5,1985.0,2020.0
2,Sweden,Stockholm,69.0,1.69,24.2,1990.0,2020.0
3,Finland,Espoo,71.0,1.73,23.7,1983.0,2020.0
4,Poland,Warsaw,102.0,1.95,26.8,1985.0,2020.0
5,,,,,,,


In [476]:
df['Ages'] = ages

In [477]:
df

Unnamed: 0,Country,City,Weight,Height,BMI,Birth Year,Current Year,Ages
0,Finland,Helsinki,74.0,1.73,24.7,1769.0,2020.0,251.0
1,UK,London,78.0,1.75,25.5,1985.0,2020.0,35.0
2,Sweden,Stockholm,69.0,1.69,24.2,1990.0,2020.0,30.0
3,Finland,Espoo,71.0,1.73,23.7,1983.0,2020.0,37.0
4,Poland,Warsaw,102.0,1.95,26.8,1985.0,2020.0,35.0
5,,,,,,,,


The person in the first row lived so far for 251 years. It is unlikely for someone to live so long. Either it is a typo or the data is cooked. So lets fill that data with average of the columns without including outlier. 

mean = (35 + 30) / 2

In [478]:
mean = (35 + 30) / 2
print('Mean: ',mean) #it is good to add some description to the output, so we know what is what

Mean:  32.5


In [479]:
df

Unnamed: 0,Country,City,Weight,Height,BMI,Birth Year,Current Year,Ages
0,Finland,Helsinki,74.0,1.73,24.7,1769.0,2020.0,251.0
1,UK,London,78.0,1.75,25.5,1985.0,2020.0,35.0
2,Sweden,Stockholm,69.0,1.69,24.2,1990.0,2020.0,30.0
3,Finland,Espoo,71.0,1.73,23.7,1983.0,2020.0,37.0
4,Poland,Warsaw,102.0,1.95,26.8,1985.0,2020.0,35.0
5,,,,,,,,


We can use the iloc method to impute the value.
DataFrame.iloc(row, col)

In [480]:
df.iloc[0, 7] = (35 + 30) / 2

In [481]:
df

Unnamed: 0,Country,City,Weight,Height,BMI,Birth Year,Current Year,Ages
0,Finland,Helsinki,74.0,1.73,24.7,1769.0,2020.0,32.5
1,UK,London,78.0,1.75,25.5,1985.0,2020.0,35.0
2,Sweden,Stockholm,69.0,1.69,24.2,1990.0,2020.0,30.0
3,Finland,Espoo,71.0,1.73,23.7,1983.0,2020.0,37.0
4,Poland,Warsaw,102.0,1.95,26.8,1985.0,2020.0,35.0
5,,,,,,,,


### Boolean Indexing

In [482]:
df[df['Birth Year'] < 1900]

Unnamed: 0,Country,City,Weight,Height,BMI,Birth Year,Current Year,Ages
0,Finland,Helsinki,74.0,1.73,24.7,1769.0,2020.0,32.5


## Exercises

1. Read the hacker_news.csv file from data directory 
1. Get the first five rows
1. Get the last five rows
1. Get the title column as pandas series
1. Count the number of rows and columns
    - Filter the titles which contain python
    - Filter the titles which contain JavaScript
    - Explore the data and make sense of it