# Hands-on Lab: Loading data with Pandas

## Objectives
After completing this lab you will be able to:

* Use Pandas to access and view data

## Table of Contents
* About the Dataset
* Introduction of Pandas
* Viewing Data and Accessing Data
* Quiz on DataFrame

## About the Dataset
The table has one row for each album and several columns.

* **artist**: Name of the artist
* **album**: Name of the album
* **released_year**: Year the album was released
* **length_min_sec**: Length of the album (hours,minutes,seconds)
* **genre**: Genre of the album
* **music_recording_sales_millions**: Music recording sales (millions in USD) on [SONG://DATABASE]
* **claimed_sales_millions**: Album's claimed sales (millions in USD) on [SONG://DATABASE]
* **date_released**: Date on which the album was released
* **soundtrack**: Indicates if the album is the movie soundtrack (Y) or (N)
* **rating_of_friends**: Indicates the rating from your friends from 1 to 10

You can see the dataset here:

![image.png](attachment:image.png)

In [1]:
import pandas as pd

After the import command, we now have access to a large number of pre-built classes and functions. This assumes the library is installed; in our lab environment all the necessary libraries are installed. One way pandas allows you to work with data is a dataframe. Let's go through the process to go from a comma separated values (<b>.csv</b>) file to a dataframe. This variable <code>csv_path</code> stores the path of the <b>.csv</b>, that is  used as an argument to the <code>read_csv</code> function. The result is stored in the object <code>df</code>, this is a common short form used for a variable referring to a Pandas dataframe.

In [2]:
df = "Book1.csv"
df = pd.read_csv("Book1.csv")

We can use the method <code>head()</code> to examine the first five rows of a dataframe:

In [4]:
# Print first five rows of the dataframe

df.head()

Unnamed: 0,Artist,Album,Released,Length,Genre,Music recording sales (millions),Claimed sales (millions,Released.1,Soundtrack,Rating (friends
0,Michael Jackson,Thriller,1982,00:42:19,"Pop, rock,R&B",46.0,65,30-Nov-82,,10.0
1,AC/DC,Back in Black,1980,00:42:11,Hard rock,26.1,50,25-Jul-80,,8.5
2,Pink Floyd,The Dark side of the Moon,1973,00:42:49,Progressive rock,24.2,45,01-Mar-73,,9.5
3,Whitney Houston,The Bodyguard,1992,00:57:44,"Soundtrack/R&B, soul, pop",26.1,50,25-Jul-80,Y,7.0
4,Meat Loaf,Bat out of Hell,1977,00:46:33,"Hard rock, progressive rock",20.6,43,21-Oct-77,,7.0


We use the path of the excel file and the function <code>read_excel</code>. The result is a data frame as before:

In [6]:
#df = pd.read_excel("Book1.csv")
#df.head()

We can access the column **Length** and assign it a new dataframe **x**:

In [7]:
# Access to the column Length

x = df[['Length']]
x

Unnamed: 0,Length
0,00:42:19
1,00:42:11
2,00:42:49
3,00:57:44
4,00:46:33
5,00:43:08
6,01:15:54
7,00:40:01


## Viewing Data and Accessing Data
You can also get a column as a series. You can think of a Pandas series as a 1-D dataframe. Just use one bracket:

In [8]:
# Get the column as a series

x = df['Length']
x

0    00:42:19
1    00:42:11
2    00:42:49
3    00:57:44
4    00:46:33
5    00:43:08
6    01:15:54
7    00:40:01
Name: Length, dtype: object

You can also get a column as a dataframe. For example, we can assign the column **Artist:**

In [9]:
# Get the column as a dataframe

x = df[['Artist']]
type(x)

pandas.core.frame.DataFrame

You can do the same thing for multiple columns; we just put the dataframe name, in this case, <code>df</code>, and the name of the multiple column headers enclosed in double brackets. The result is a new dataframe comprised of the specified columns:

In [10]:
# Access to multiple columns

y = df[['Artist','Length','Genre']]
y

Unnamed: 0,Artist,Length,Genre
0,Michael Jackson,00:42:19,"Pop, rock,R&B"
1,AC/DC,00:42:11,Hard rock
2,Pink Floyd,00:42:49,Progressive rock
3,Whitney Houston,00:57:44,"Soundtrack/R&B, soul, pop"
4,Meat Loaf,00:46:33,"Hard rock, progressive rock"
5,Eagles,00:43:08,"Rock, soft rock, flok rock"
6,Bee Gees,01:15:54,Disco
7,Fleetwood Mac,00:40:01,Soft rock


One way to access unique elements is the <code>iloc</code> method, where you can access the 1st row and the 1st column as follows:

In [11]:
# Access the value on the first row and the first column

df.iloc[0, 0]

'Michael Jackson'

You can access the 2nd row and the 1st column as follows:

In [12]:
# Access the value on the second row and the first column

df.iloc[1,0]

'AC/DC'

You can access the 1st row and the 3rd column as follows:

In [13]:
# Access the value on the first row and the third column

df.iloc[0,2]

1982

In [14]:
# Access the value on the second row and the third column
df.iloc[1,2]

1980

You can access the column using the name as well, the following are the same as above:

In [15]:
# Access the column using the name

df.loc[1, 'Artist']

'AC/DC'

In [16]:
# Access the column using the name

df.loc[1, 'Artist']

'AC/DC'

In [17]:
# Access the column using the name

df.loc[0, 'Released']

1982

In [18]:
# Access the column using the name

df.loc[1, 'Released']

1980

You can perform slicing using both the index and the name of the column:

In [19]:
# Slicing the dataframe

df.iloc[0:2, 0:3]

Unnamed: 0,Artist,Album,Released
0,Michael Jackson,Thriller,1982
1,AC/DC,Back in Black,1980


In [20]:
# Slicing the dataframe using name

df.loc[0:2, 'Artist':'Released']

Unnamed: 0,Artist,Album,Released
0,Michael Jackson,Thriller,1982
1,AC/DC,Back in Black,1980
2,Pink Floyd,The Dark side of the Moon,1973
