<h1>Introduction to Pandas Python</h1>

<p><strong>Welcome!</strong> This notebook will teach you about using <code>Pandas</code> in the Python Programming Language. By the end of this lab, you'll know how to use <code>Pandas</code> package to view and access data.</p>

<h2 id="pandas">Introduction of <code>Pandas</code></h2>

In [4]:
# Dependency needed to install file 

!pip install xlrd



In [8]:
# Import required library

import pandas as pd

After the import command, we now have access to a large number of pre-built classes and functions. This assumes the library is installed; in our lab environment all the necessary libraries are installed. One way pandas allows you to work with data is a dataframe. Let's go through the process to go from a comma separated values (<b>.csv</b>) file to a dataframe. This variable <code>csv_path</code> stores the path of the <b>.csv</b>, that is  used as an argument to the <code>read_csv</code> function. The result is stored in the object <code>df</code>, this is a common short form used for a variable referring to a Pandas dataframe. 

In [26]:
# Read data from CSV file

csv_path = 'https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/Chapter%204/Datasets/TopSellingAlbums.csv'
df = pd.read_csv(csv_path)

We can use the method <code>head()</code> to examine the first five rows of a dataframe: 

In [None]:
# Print first five rows of the dataframe

df.head()

 We use the path of the excel file and the function <code>read_excel</code>. The result is a data frame as before:

In [None]:
# Read data from Excel File and print the first five rows

xlsx_path = 'https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/Chapter%204/Datasets/TopSellingAlbums.xlsx'

df = pd.read_excel(xlsx_path)
df.head()

We can access the column <b>Length</b> and assign it a new dataframe <b>x</b>:

In [14]:
# Access to the column Length

x = df[['Length']]
x

Unnamed: 0,Length
0,0:42:19
1,0:42:11
2,0:42:49
3,0:57:44
4,0:46:33
5,0:43:08
6,1:15:54
7,0:40:01


 The process is shown in the figure: 

<img src="https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/Chapter%204/Images/DataEgOne.png" width="750" />

<hr>

<h2 id="data">Viewing Data and Accessing Data</h2>

You can also get a column as a series. You can think of a Pandas series as a 1-D dataframe. Just use one bracket: 

In [15]:
# Get the column as a series

x = df['Length']
x

0    0:42:19
1    0:42:11
2    0:42:49
3    0:57:44
4    0:46:33
5    0:43:08
6    1:15:54
7    0:40:01
Name: Length, dtype: object

You can also get a column as a dataframe. For example, we can assign the column <b>Artist</b>:

In [16]:
# Get the column as a dataframe

x = type(df[['Artist']])
x

pandas.core.frame.DataFrame

You can do the same thing for multiple columns; we just put the dataframe name, in this case, <code>df</code>, and the name of the multiple column headers enclosed in double brackets. The result is a new dataframe comprised of the specified columns:

In [17]:
# Access to multiple columns

y = df[['Artist','Length','Genre']]
y

Unnamed: 0,Artist,Length,Genre
0,Michael Jackson,0:42:19,"pop, rock, R&B"
1,AC/DC,0:42:11,hard rock
2,Pink Floyd,0:42:49,progressive rock
3,Whitney Houston,0:57:44,"R&B, soul, pop"
4,Meat Loaf,0:46:33,"hard rock, progressive rock"
5,Eagles,0:43:08,"rock, soft rock, folk rock"
6,Bee Gees,1:15:54,disco
7,Fleetwood Mac,0:40:01,soft rock


The process is shown in the figure:

<img src = "https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/Chapter%204/Images/DataEgTwo.png" width="1100" />

One way to access unique elements is the <code>iloc</code> method, where you can access the 1st row and the 1st column as follows:

In [19]:
# Access the value on the first row and the first column

df.iloc[0, 0]

'Michael Jackson'

You can access the 2nd row and the 1st column as follows:

In [18]:
# Access the value on the second row and the first column

df.iloc[1,0]

'AC/DC'

You can access the 1st row and the 3rd column as follows: 

In [None]:
# Access the value on the first row and the third column

df.iloc[0,2]

You can access the column using the name as well, the following are the same as above: 

In [20]:
# Access the column using the name

df.loc[0, 'Artist']

'Michael Jackson'

In [21]:
# Access the column using the name

df.loc[1, 'Artist']

'AC/DC'

In [22]:
# Access the column using the name

df.loc[0, 'Released']

1982

In [23]:
# Access the column using the name

df.loc[1, 'Released']

1980

You can perform slicing using both the index and the name of the column:

In [24]:
# Slicing the dataframe

df.iloc[0:2, 0:3]

Unnamed: 0,Artist,Album,Released
0,Michael Jackson,Thriller,1982
1,AC/DC,Back in Black,1980


In [25]:
# Slicing the dataframe using name

df.loc[0:2, 'Artist':'Released']

Unnamed: 0,Artist,Album,Released
0,Michael Jackson,Thriller,1982
1,AC/DC,Back in Black,1980
2,Pink Floyd,The Dark Side of the Moon,1973


<hr>

<h2 id="quiz">Quiz on DataFrame</h2>

Use a variable <code>q</code> to store the column <b>Rating</b> as a dataframe

In [13]:
# Write your code below and press Shift+Enter to execute

Double-click __here__ for the solution.

<!-- Your answer is below:
q = df[['Rating']]
q
-->

Assign the variable <code>q</code> to the dataframe that is made up of the column <b>Released</b> and <b>Artist</b>:

In [None]:
# Write your code below and press Shift+Enter to execute

Double-click __here__ for the solution.

<!-- Your answer is below:
q = df[['Released', 'Artist']]
q
-->

Access the 2nd row and the 3rd column of <code>df</code>:

In [28]:
# Write your code below and press Shift+Enter to execute

Double-click __here__ for the solution.

<!-- Your answer is below:
df.iloc[1, 2]
-->

<h3>About the Authors:</h3>  
<p><a href="https://www.linkedin.com/in/joseph-s-50398b136/" target="_blank">Joseph Santarcangelo</a> is a Data Scientist at IBM, and holds a PhD in Electrical Engineering. His research focused on using Machine Learning, Signal Processing, and Computer Vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.</p>

Other contributors: <a href="www.linkedin.com/in/jiahui-mavis-zhou-a4537814a">Mavis Zhou</a>

<p>Copyright &copy; 2018 IBM Developer Skills Network. This notebook and its source code are released under the terms of the <a href="https://cognitiveclass.ai/mit-license/">MIT License</a>.</p>