# Introduction to Pandas

Pandas objects can be thought of as excel tables in which the rows and columns are identified with labels rather than numbers and alphabets and individual (or group of) cells can be referenced by labels and queries.

To use pandas objects first import pandas into your notebook:

In [44]:
import pandas as pd

Now lets get some data into a pandas table from the internet. The website  "http://raagmala.ca/popular-raags-in-indian-classical-music" has data tables about Indian classical music. Look at the table on the website. The table has a header identifying the column names. Let's get the data from the table into a pandas object, called "DataFrame".

![Raagmala](http://raagmala.ca/wp-content/themes/raagmala/images/raagmala-logo.png)
![Read Data](https://pandas.pydata.org/pandas-docs/stable/_images/02_io_readwrite1.svg)

In [45]:
url = "http://raagmala.ca/popular-raags-in-indian-classical-music"
df = pd.read_html(url)

That's all there is to it!
Pandas imported all the tables it could find into a list of DataFrames. 
You can confirm that df is of type list and df\[0\], the first table found is of type DataFrame.

![DataFrame](https://pandas.pydata.org/pandas-docs/stable/_images/01_table_dataframe1.svg)

In [46]:
print("type of df is ",type(df), "and \ntype of df[0] is ", type(df[0]))

type of df is  <class 'list'> and 
type of df[0] is  <class 'pandas.core.frame.DataFrame'>


For easy referencing, lets raagas to df\[0\] and see what the first 5 rows contain:

In [47]:
raagas = df[0]
raagas.head()

Unnamed: 0,0,1,2,3
0,Raag Name,Thaat,Aaroh/Avroh/Vadi/Samvadi,Time of Performance
1,Ahir Bhairav,Bhairav,"S r G m P D n S’S’ n D P m G r S Vaadi: m, Sam...",6am – 9am
2,Bageshri,Kafi,"S g m D n S’S’ n D m m P D m g r S Vaadi: m, S...",12 am – 3 am
3,Basant,Purvi,S m G M’ d N S’S’ N d P M’ G M’ G r S Vaadi: S...,3 am – 6 am
4,Bhairav,Bhairav,S r G m P d N S’S’ N d P m G r S Vaadi: d Samv...,6 am – 9 am


Our importer missed the column header. So lets start over.

Let's specify that the row "0" has the column headings and repeat the rest of the steps:

In [48]:
df = pd.read_html(url, header=0)
raagas = df[0]
raagas.head()

Unnamed: 0,Raag Name,Thaat,Aaroh/Avroh/Vadi/Samvadi,Time of Performance
0,Ahir Bhairav,Bhairav,"S r G m P D n S’S’ n D P m G r S Vaadi: m, Sam...",6am – 9am
1,Bageshri,Kafi,"S g m D n S’S’ n D m m P D m g r S Vaadi: m, S...",12 am – 3 am
2,Basant,Purvi,S m G M’ d N S’S’ N d P M’ G M’ G r S Vaadi: S...,3 am – 6 am
3,Bhairav,Bhairav,S r G m P d N S’S’ N d P m G r S Vaadi: d Samv...,6 am – 9 am
4,Bhairavi,Bhariavi,S r g m P d n S’S’ n d P m g r S Vaadi: m Samv...,6 am – 9 am


Can you see the difference?

Now we can reference each data "Series", or column, with its name. For example ragas\['Raag Name'\]

![Series](https://pandas.pydata.org/pandas-docs/stable/_images/01_table_series.svg)

In [49]:
raagas['Raag Name']

0         Ahir Bhairav
1             Bageshri
2               Basant
3              Bhairav
4             Bhairavi
5           Bhimpalasi
6             Bhoopali
7                Bihag
8      Bilaskhani Todi
9              Bilawal
10      Darbari Kanada
11                Desh
12               Durga
13                Kafi
14              Kalyan
15               Kedar
16             Khamaaj
17               Lalit
18            Malkauns
19               Marwa
20    Miyan ki Malhaar
21       Miyan ki Todi
22              Pahadi
23              Puriya
24      Shuddha Sarang
25            Shankara
26         Shivranjani
27         Tilak Kamod
28              Vibhas
29               Yaman
Name: Raag Name, dtype: object

Which are the unique Thaat's are there in the database?

In [50]:
raagas['Thaat'].unique()

array(['Bhairav', 'Kafi', 'Purvi', 'Bhariavi', 'Kalyan', 'Bilawal',
       'Bhairavi', 'Asavari', 'Khamaaj', 'Marwa', 'Todi', 'Khamaj'],
      dtype=object)

Let's clean our data next.

First replace the seperator '-',' – ' so that we can  split the time of performance and create two new columns "Perfom From" and "Perform Upto"

![Add column](https://pandas.pydata.org/pandas-docs/stable/_images/05_newcolumn_1.svg)

In [51]:
raagas['Time of Performance'] = raagas['Time of Performance'].str.replace('-',' – ' )
perform_from = raagas['Time of Performance'].str.split("–").str.get(0)
perform_upto = raagas['Time of Performance'].str.split("–").str.get(1)
raagas['Perform From'] = perform_from
raagas['Perform Upto'] = perform_upto
raagas

Unnamed: 0,Raag Name,Thaat,Aaroh/Avroh/Vadi/Samvadi,Time of Performance,Perform From,Perform Upto
0,Ahir Bhairav,Bhairav,"S r G m P D n S’S’ n D P m G r S Vaadi: m, Sam...",6am – 9am,6am,9am
1,Bageshri,Kafi,"S g m D n S’S’ n D m m P D m g r S Vaadi: m, S...",12 am – 3 am,12 am,3 am
2,Basant,Purvi,S m G M’ d N S’S’ N d P M’ G M’ G r S Vaadi: S...,3 am – 6 am,3 am,6 am
3,Bhairav,Bhairav,S r G m P d N S’S’ N d P m G r S Vaadi: d Samv...,6 am – 9 am,6 am,9 am
4,Bhairavi,Bhariavi,S r g m P d n S’S’ n d P m g r S Vaadi: m Samv...,6 am – 9 am,6 am,9 am
5,Bhimpalasi,Kafi,Sa Ga Ma Pa Ni Sa” Vaadi: Ma Samvaadi: Sa,4 pm – 6 pm,4 pm,6 pm
6,Bhoopali,Kalyan,S R G P D S’S’ D P G R S Vaadi: G Samvaadi: D,6 pm – 9 pm,6 pm,9 pm
7,Bihag,Bilawal,‘N S G m P N S’S’ N D P M’ G m G R S Vaadi: G ...,9 pm – 12 am,9 pm,12 am
8,Bilaskhani Todi,Bhairavi,S r G P d S’S’ r’ n d m g r S Vaadi: d Samvaad...,9 am – 12 pm,9 am,12 pm
9,Bilawal,Bilawal,S R G P D N S’S’ N D P DG m RGPM G RS Vaadi: D...,6 am – 9 am,6 am,9 am


We can now get rid of the "Time of Performance" Series.

axis=1 tells Pandas we want to drop a colum, not a row. inplace=True tells pandas to make the change permanant.

In [52]:
raagas.drop(['Time of Performance'], axis=1, inplace=True)
raagas

Unnamed: 0,Raag Name,Thaat,Aaroh/Avroh/Vadi/Samvadi,Perform From,Perform Upto
0,Ahir Bhairav,Bhairav,"S r G m P D n S’S’ n D P m G r S Vaadi: m, Sam...",6am,9am
1,Bageshri,Kafi,"S g m D n S’S’ n D m m P D m g r S Vaadi: m, S...",12 am,3 am
2,Basant,Purvi,S m G M’ d N S’S’ N d P M’ G M’ G r S Vaadi: S...,3 am,6 am
3,Bhairav,Bhairav,S r G m P d N S’S’ N d P m G r S Vaadi: d Samv...,6 am,9 am
4,Bhairavi,Bhariavi,S r g m P d n S’S’ n d P m g r S Vaadi: m Samv...,6 am,9 am
5,Bhimpalasi,Kafi,Sa Ga Ma Pa Ni Sa” Vaadi: Ma Samvaadi: Sa,4 pm,6 pm
6,Bhoopali,Kalyan,S R G P D S’S’ D P G R S Vaadi: G Samvaadi: D,6 pm,9 pm
7,Bihag,Bilawal,‘N S G m P N S’S’ N D P M’ G m G R S Vaadi: G ...,9 pm,12 am
8,Bilaskhani Todi,Bhairavi,S r G P d S’S’ r’ n d m g r S Vaadi: d Samvaad...,9 am,12 pm
9,Bilawal,Bilawal,S R G P D N S’S’ N D P DG m RGPM G RS Vaadi: D...,6 am,9 am


Now you can find evening raagas!

![Filter](https://pandas.pydata.org/pandas-docs/stable/_images/03_subset_rows.svg)

In [53]:
evening_raagas = raagas[raagas['Perform From'].str.contains("pm")]
evening_raagas

Unnamed: 0,Raag Name,Thaat,Aaroh/Avroh/Vadi/Samvadi,Perform From,Perform Upto
5,Bhimpalasi,Kafi,Sa Ga Ma Pa Ni Sa” Vaadi: Ma Samvaadi: Sa,4 pm,6 pm
6,Bhoopali,Kalyan,S R G P D S’S’ D P G R S Vaadi: G Samvaadi: D,6 pm,9 pm
7,Bihag,Bilawal,‘N S G m P N S’S’ N D P M’ G m G R S Vaadi: G ...,9 pm,12 am
10,Darbari Kanada,Asavari,S R g m P d n S’S’ d n P m P g m R S Vaadi: R ...,9 pm,12 am
11,Desh,Khamaaj,Sa R m P N S’S’ n D P m D R S Vaadi: P Samvadi: R,9 pm,12 am
12,Durga,Bilawal,S R m P D S’S’ D P m R ‘D S Vaadi:m Samvaadi: S,9 pm,12 am
13,Kafi,Kafi,S R g m P D n S’S’ n D P m g R S Vaadi: P Samv...,6 pm,9 pm
14,Kalyan,Kalyan,"S R G P D S’S, N DP, PM’G R S Vaadi: G Samvadi: D",6 pm,9 pm
15,Kedar,Kalyan,"S m, m GP, M’PDN S’S’ NDP, M’PD, DPM’Pm, PmRS ...",9 pm,12 am
16,Khamaaj,Khamaaj,S G m P D n S’S’ n D P m G R S Vaadi: G Samvad...,6 pm,9 pm


Now we can get a summary of the properties of the DataFrame to understand it in a glance

![Summary](https://pandas.pydata.org/pandas-docs/stable/_images/06_aggregate.svg)

In [54]:
evening_raagas.describe()

Unnamed: 0,Raag Name,Thaat,Aaroh/Avroh/Vadi/Samvadi,Perform From,Perform Upto
count,17,17,17,17,17
unique,17,7,17,5,4
top,Tilak Kamod,Bilawal,S R g m P D n S’S’ n D P m g R S Vaadi: P Samv...,6 pm,9 pm
freq,1,4,1,8,8


And you can find morning raagas!

In [55]:
morning_raagas = raagas[raagas['Perform From'].str.contains("am")]
morning_raagas = raagas[raagas['Perform From'].str.contains("am")]
morning_raagas

Unnamed: 0,Raag Name,Thaat,Aaroh/Avroh/Vadi/Samvadi,Perform From,Perform Upto
0,Ahir Bhairav,Bhairav,"S r G m P D n S’S’ n D P m G r S Vaadi: m, Sam...",6am,9am
1,Bageshri,Kafi,"S g m D n S’S’ n D m m P D m g r S Vaadi: m, S...",12 am,3 am
2,Basant,Purvi,S m G M’ d N S’S’ N d P M’ G M’ G r S Vaadi: S...,3 am,6 am
3,Bhairav,Bhairav,S r G m P d N S’S’ N d P m G r S Vaadi: d Samv...,6 am,9 am
4,Bhairavi,Bhariavi,S r g m P d n S’S’ n d P m g r S Vaadi: m Samv...,6 am,9 am
8,Bilaskhani Todi,Bhairavi,S r G P d S’S’ r’ n d m g r S Vaadi: d Samvaad...,9 am,12 pm
9,Bilawal,Bilawal,S R G P D N S’S’ N D P DG m RGPM G RS Vaadi: D...,6 am,9 am
17,Lalit,Purvi,‘N r G m M’ m G M’ d N S’S’ Nr’ N d M’m G M’G ...,3 am,6 am
18,Malkauns,Bhairavi,S g m d n S’S’ n d m g S Vaadi: m Samvadi: S,12 am,3 am
21,Miyan ki Todi,Todi,S r g M’ d N S’S’ N d P M g r S Vaadi: d Samva...,9 am,12 pm


And a summary of the morning ragas. Let's transpose the rows to colums this time.

In [56]:
morning_raagas.describe().transpose()

Unnamed: 0,count,unique,top,freq
Raag Name,12,12,Bhairav,1
Thaat,12,7,Bhairav,3
Aaroh/Avroh/Vadi/Samvadi,12,12,S g m d n S’S’ n d m g S Vaadi: m Samvadi: S,1
Perform From,12,5,6 am,4
Perform Upto,12,6,3 am,3


Let's view all the column details in the ragas DataFrame

In [57]:
raagas.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 5 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Raag Name                 30 non-null     object
 1   Thaat                     30 non-null     object
 2   Aaroh/Avroh/Vadi/Samvadi  30 non-null     object
 3   Perform From              30 non-null     object
 4   Perform Upto              29 non-null     object
dtypes: object(5)
memory usage: 1.3+ KB


Lets look for morning raagas that are played at 6 AM

![Select specefic rows](https://pandas.pydata.org/pandas-docs/stable/_images/03_subset_columns_rows1.svg)

In [58]:
morning_raagas.loc[morning_raagas["Perform From"].str.contains('6'),"Raag Name"]

0     Ahir Bhairav
3          Bhairav
4         Bhairavi
9          Bilawal
28          Vibhas
Name: Raag Name, dtype: object

What are the common Thaat in the evening and morning raag?

In [59]:
list(set(evening_raagas["Thaat"]) & set(morning_raagas["Thaat"]))


['Bilawal', 'Kafi']

**HOMEWORK 1**

We still have too much information packed in the column "Aaroh/Avroh/Vadi/Samvadi". 

As your homework move Vadi and Samvadi to their own columns and paste your results in your assignments notebook

**HOMEWORK2**

Identify data that you want to analyse from the web, a csv, or excel. 

Create a Jupyter Notebook that pulls and analyses the data