### Notes:

#### Packages required:
1. pip install covid
2. Data set: https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series


### Methods I came across while doing the project.

#### 1. loc and iloc in Pandas:


1. With loc and iloc you can do practically any data selection operation on DataFrames you can think of. 
2. ***loc*** is ***label-based***, which means that you have to specify rows and columns based on their row and column labels. 
3. ***iloc*** is ***integer index based**, so you have to specify rows and columns by their integer index like you did in the previous exercise.

In [None]:
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

#  Each pairs gives out the result in same way
print(cars.loc['RU'])
cars.iloc[4]

cars.loc[['RU']]
cars.iloc[[4]]

cars.loc[['RU', 'AUS']]
cars.iloc[[4, 1]]

#### loc in Panda

**loc** is label-based, which means that we have to specify the name of the rows and columns that we need to filter out.


For example, let’s say we search for the rows whose index is 1, 2 or 100. We will not get the first, second or the hundredth row here. Instead, we will get the results only if the name of any index is 1, 2 or 100.

So, we can filter the data using the loc function in Pandas even if the indices are not an integer in our dataset.

#### iloc in Panda

**iloc** is integer index-based. So here, we have to specify rows and columns by their integer index.



Let’s say we search for the rows with index 1, 2 or 100. It will return the first, second and hundredth row, regardless of the name or labels we have in the index in our dataset.

#### Create a sample dataset


First, we need a dataset to apply loc and iloc, right? Let’s do that.

We will create a sample student dataset consisting of 5 columns – 
age, section, city, gender, and favorite color. 

This dataset will contain both numerical as well as categorical variables:


In [4]:
# importing pandas and numpy
import pandas as pd
import numpy as np

# crete a sample dataframe
data = pd.DataFrame({
    'age' :     [ 10, 22, 13, 21, 12, 11, 17],
    'section' : [ 'A', 'B', 'C', 'B', 'B', 'A', 'A'],
    'city' :    [ 'Gurgaon', 'Delhi', 'Mumbai', 'Delhi', 'Mumbai', 'Delhi', 'Mumbai'],
    'gender' :  [ 'M', 'F', 'F', 'M', 'M', 'M', 'F'],
    'favourite_color' : [ 'red', np.NAN, 'yellow', np.NAN, 'black', 'green', 'red']
})

# view the data
data

Unnamed: 0,age,section,city,gender,favourite_color
0,10,A,Gurgaon,M,red
1,22,B,Delhi,F,
2,13,C,Mumbai,F,yellow
3,21,B,Delhi,M,
4,12,B,Mumbai,M,black
5,11,A,Delhi,M,green
6,17,A,Mumbai,F,red


#### Find all the rows based on any condition in a column

One thing we use almost always when we’re exploring a dataset – filtering the data based on a given condition. For example, we might need to find all the rows in our dataset where age is more than x years, or the city is Delhi, and so on.


We can solve types of queries with a simple line of code using pandas.DataFrame.loc[]. We just need to pass the condition within the loc statement.

In [5]:
# Let’s try to find the rows where the value of age is greater than or equal to 15: 

# select all rows with a condition 
data.loc[data.age >= 15]


Unnamed: 0,age,section,city,gender,favourite_color
1,22,B,Delhi,F,
3,21,B,Delhi,M,
6,17,A,Mumbai,F,red


#### Find all the rows with more than one condition

Similarly, we can also use multiple conditions to filter our data, such as finding all the rows where the age is greater than or equal to 12 and the gender is also male:

In [6]:
# select with multiple conditions
data.loc[(data.age >= 12) & (data.gender == 'M')]

Unnamed: 0,age,section,city,gender,favourite_color
3,21,B,Delhi,M,
4,12,B,Mumbai,M,black


#### Select a range of rows using loc


Using loc, we can also slice the Pandas dataframe over a range of indices. If the indices are not in the sorted order, it will select only the rows with index 1 and 3 (as you’ll see in the below example). And if the indices are not numbers, then we cannot slice our dataframe.


In that case, we need to use the iloc function to slice our Pandas dataframe.

In [7]:
#slice
data.loc[1:3]

Unnamed: 0,age,section,city,gender,favourite_color
1,22,B,Delhi,F,
2,13,C,Mumbai,F,yellow
3,21,B,Delhi,M,


#### Select only required columns with a condition

We can also select the columns that are required of the rows that satisfy our condition.

For example, if our dataset contains hundreds of columns and we want to view only a few of them, then we can add a list of columns after the condition within the loc statement itself:

In [8]:

# select few columns with a condition
data.loc[(data.age >= 12), ['city', 'gender']]

Unnamed: 0,city,gender
1,Delhi,F
2,Mumbai,F
3,Delhi,M
4,Mumbai,M
6,Mumbai,F


#### Update the values of a particular column on selected rows

This is one of my favorite hacks in Python Pandas!

We often have to update values in our dataset based on a certain condition. For example, if the values in age are greater than equal to 12, then we want to update the values of the column section to be “M”.

We can do this by running a for loop as well but if our dataset is big in size, then it would take forever to complete the task. Using loc in Pandas, we can do this within seconds, even on bigger datasets!

We just need to specify the condition followed by the target column and then assign the value with which we want to update:

In [9]:
# update multiple columns with condition
data.loc[(data.age >= 20), ['section', 'city']] = ['S','Pune']
data

Unnamed: 0,age,section,city,gender,favourite_color
0,10,A,Gurgaon,M,red
1,22,S,Pune,F,
2,13,C,Mumbai,F,yellow
3,21,S,Pune,M,
4,12,B,Mumbai,M,black
5,11,A,Delhi,M,green
6,17,A,Mumbai,F,red


#### Select rows with indices using iloc

When we are using iloc, we need to specify the rows and columns by their integer index. If we want to select only the first and third row, we simply need to put this into a list in the iloc statement with our dataframe:

In [18]:
# select rows with indexes
data.iloc[[0,2]]

Unnamed: 0,age,section,city,gender,favourite_color
0,10,A,Gurgaon,M,red
2,13,C,Mumbai,F,yellow


#### Select rows with particular indices and particular columns

Earlier, we selected a few columns from the dataset using the loc function. We can do this using the iloc function. Keep in mind that we need to provide the index number of the column instead of the column name:

In [22]:
# select rows with particular indexes and particular columns
data.iloc[[0,2], [1,4]]

Unnamed: 0,section,favourite_color
0,A,red
2,C,yellow


#### Select a range of rows using iloc

We can slice a dataframe using iloc as well. We need to provide the start_index and end_index+1 to slice a given dataframe. If the indices are not the sorted numbers even then it will select the starting_index row number up to the end_index:

In [23]:
# select a range of rows
data.iloc[1:4]

Unnamed: 0,age,section,city,gender,favourite_color
1,22,S,Pune,F,
2,13,C,Mumbai,F,yellow
3,21,S,Pune,M,


#### Select a range of rows and columns using iloc

Slice the data frame over both rows and columns. In the below example, we selected the rows from (1-2) and columns from (2-3).

In [31]:
# select range of rows and column
data.iloc[1:3,2:4]

Unnamed: 0,city,gender
1,Pune,F
2,Mumbai,F
