In [1]:
import pandas as pd
import numpy as np

In [2]:
# Lets first of all create the same dataframe based on some random values as desired
df = pd.DataFrame({
    'Date':['01-01-2021','02-01-2021','03-01-2021','04-01-2021','05-01-2021','06-01-2021','07-01-2021'],
    'Day':['Fri','Sat','Sun','Mon','Tue','Wed','Thur'],
    'Weather': ['rainy','rainy','cloudy','windy','sunny','sunny','windy'],
    'Temperature':[25,28,30,23,34,33,24]
})

df

Unnamed: 0,Date,Day,Weather,Temperature
0,01-01-2021,Fri,rainy,25
1,02-01-2021,Sat,rainy,28
2,03-01-2021,Sun,cloudy,30
3,04-01-2021,Mon,windy,23
4,05-01-2021,Tue,sunny,34
5,06-01-2021,Wed,sunny,33
6,07-01-2021,Thur,windy,24


As we can see this DataFrame has rows and columns similar to a sql relational table. So we can perform similar actions using conditions

### Data Analysis exercise

- Find maximum temperature
- Find the date and weather of the day having maximum temperature
- Find all the dates havind windy weather

In [3]:
df.Temperature.max()

34

In [4]:
df[['Date','Weather']][df.Temperature==df.Temperature.max()]

Unnamed: 0,Date,Weather
4,05-01-2021,sunny


In [5]:
df[['Date','Weather']][df.Weather=='windy']

Unnamed: 0,Date,Weather
3,04-01-2021,windy
6,07-01-2021,windy


### Basic operations using Pandas

In [6]:
df.head()

Unnamed: 0,Date,Day,Weather,Temperature
0,01-01-2021,Fri,rainy,25
1,02-01-2021,Sat,rainy,28
2,03-01-2021,Sun,cloudy,30
3,04-01-2021,Mon,windy,23
4,05-01-2021,Tue,sunny,34


<b>Index Manipulation</b>

In [7]:
df.index

RangeIndex(start=0, stop=7, step=1)

Set some other column as the index

In [8]:
# If we do not save after setting some other column as index to a new DataFrame or an existing,
# the changes made will be temporary. You can try by commenting the below codes and using this code

# ---------------------
# df.set_index('Date')
# df.head()

# The changes made will be temporary only. Therefore we need to store it.
# Later we'll also see how we can make this temporary change make permament effect without needing to save the changes in
# another DataFrame
# ---------------------

df_new = df.set_index('Date')
df_new.head()

Unnamed: 0_level_0,Day,Weather,Temperature
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
01-01-2021,Fri,rainy,25
02-01-2021,Sat,rainy,28
03-01-2021,Sun,cloudy,30
04-01-2021,Mon,windy,23
05-01-2021,Tue,sunny,34


Earlier we had a interger index but now our Date Column is our index

```loc = location```

Access a group of rows and columns by label(s) or a boolean array.

In [9]:
# now we can access values in our DataFrame using Date indexes
df_new.loc['03-01-2021']

Day               Sun
Weather        cloudy
Temperature        30
Name: 03-01-2021, dtype: object

#### Index resetting to default integers

In [10]:
df_new.reset_index(inplace=True)
df_new

Unnamed: 0,Date,Day,Weather,Temperature
0,01-01-2021,Fri,rainy,25
1,02-01-2021,Sat,rainy,28
2,03-01-2021,Sun,cloudy,30
3,04-01-2021,Mon,windy,23
4,05-01-2021,Tue,sunny,34
5,06-01-2021,Wed,sunny,33
6,07-01-2021,Thur,windy,24


It says I've a DataFrame 'df_new' and I've earlier modified/changed the index to 'Date'. But now I want to revert the changes and have my Date column back to its original state and the integers as the indexes

<b>Note: </b>As earlier we saw when we changed the index, we were asked to save the change into a new DataFrame because the changes were temporary

Now we'll see how we can do the same without need to save the changes into a new DataFrame

In [11]:
# Our original DataFrame
df.head(2)

Unnamed: 0,Date,Day,Weather,Temperature
0,01-01-2021,Fri,rainy,25
1,02-01-2021,Sat,rainy,28


In [12]:
df.set_index('Date',inplace=True)
df.head(2)

Unnamed: 0_level_0,Day,Weather,Temperature
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
01-01-2021,Fri,rainy,25
02-01-2021,Sat,rainy,28


#### Small Exercise
- Reset the ```df``` DataFrame
- Set Weather as the new Index
- Search for all rows using loc for windy weather

In [13]:
# Resetting index
df.reset_index(inplace=True)
df.head(2)

Unnamed: 0,Date,Day,Weather,Temperature
0,01-01-2021,Fri,rainy,25
1,02-01-2021,Sat,rainy,28


In [14]:
# Set Weather Column as new index
df.set_index('Weather',inplace=True)
df.head(2)

Unnamed: 0_level_0,Date,Day,Temperature
Weather,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
rainy,01-01-2021,Fri,25
rainy,02-01-2021,Sat,28


In [15]:
# Search for all rows using loc for windy weather
df.loc['windy']

Unnamed: 0_level_0,Date,Day,Temperature
Weather,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
windy,04-01-2021,Mon,23
windy,07-01-2021,Thur,24


## Different ways of Creating DataFrame
- 1. Python Dictionary (The one we've been working till now)
- 2. Using Excel
- 3. Using CSV (Mostly used)
- 4. From list of tuples
- 5. From list of dictionaries

<b>Some basic Misc.</b>

Lets see how we'll be doing all these
- First we'll create a DataFrame using python Dictionaries
- And then we'll look into how we can store that DataFrame into a csv file as well as a excel file

In [16]:
import pandas as pd
import numpy as np

# Creating a DataFrame using Python Dictionary
# Lets create a DataFrame of students score card

score_card = {'Roll no':[1,2,3,4,5,6,7,8,9,10],
             'Name':['Abhay','Abhijeet','Abhinav','Abhishek','Aditya','Ajaz','Akash','Amit','Amresh','Anand'],
             'Maths':[70,90,60,55,65,58,63,76,72,58],
             'Science':[80,76,89,75,72,68,66,82,73,71],
             }

report_card = pd.DataFrame(score_card)
report_card.head()

Unnamed: 0,Roll no,Name,Maths,Science
0,1,Abhay,70,80
1,2,Abhijeet,90,76
2,3,Abhinav,60,89
3,4,Abhishek,55,75
4,5,Aditya,65,72


Now lets save this DataFrame into a csv and excel file

In [21]:
# save to csv
report_card.to_csv('report_card.csv',index=False)

# save to excel (xlsx)
report_card.to_excel('report_card.xlsx',index=False)