## Pandas

pandas is one of the world's most popular Python library, used for everything from data manipulation to data analysis.

pandas is used for **tabular data** (data which has rows and columns)


### What can pandas do for us?

- Load Tabular Data 
- Search For Particular rows and columns
- Calculate all kinds of statistical methods.
-  Combine data from multiple sources

Run the following cell:

In [None]:
!pip3 install pandas

**Importing pandas:**

In [None]:
import pandas as pd

**How do I load a CSV file using pandas?**

Let's mess around with a **Spotify** Dataset.

In [None]:
df = pd.read_csv("top50.csv")
#CSV : Comma-separated values

**Viewing df**

In [None]:
df

**Inspecting df**

In [None]:
df.head()

In [None]:
df.info()

## Selecting Data 

### Subsetting A Column

In [None]:
df['Genre']

### Subsetting multiple columns

In [None]:
df[['Genre', 'Energy']].head()

- The `.loc` operator

Although selecting data with `[]` works, it's not necessarily the best way. 

Pandas' official documentation :


>The Python and NumPy indexing operators [] and attribute operator . provide quick and easy access to pandas data structures across a wide range of use cases. This makes interactive work intuitive, as there’s little new to learn if you already know how to deal with Python dictionaries and NumPy arrays. However, since the type of the data to be accessed isn’t known in advance, directly using standard operators has some optimization limits. For production code, we recommended that you take advantage of the optimized pandas data access methods exposed in this chapter.

In [None]:
# select a row
df.loc[12]

In [None]:
# select a cell
df.loc[12, 'Track.Name']

In [None]:
#select a column
df.loc[:, 'Track.Name'].head()

Remeber how we could perform slicing on lists? 
We can do the same thing with `.loc`!

In [None]:
# rows with label 10 to 15
df.loc[10:15]

In [None]:
# columns between Track.Name and Genre
df.loc[:, 'Track.Name':'Genre'].head()

✨**Task**

- Open `tweets.csv` using pandas into a dataframe called `new`. 
- Inspect the dataset.
- Research how you can drop the NaN values. 
- Select the `text` column.
- Select the 100th row using `.loc`
- Select the cell on the 29th row on the **text** column. 

In [None]:
#INSERT YOUR CODE HERE

## Boolean masks


What if we want to select certain rows, only if they meet a certain criteria? 

For example, let's select rows, only if their genre is "pop"

In [None]:
mask = (df['Genre'] == 'pop')
df[mask]

You can also do this in a one liner: 

In [None]:
df[df['Genre'] == 'pop']

Same result!

### Chaining Boolean Masks

We can use `|` -> **OR** and `&` -> **AND** to chain multiple masks together. 

Say we wanted to view rows that qualify as `pop` _or_ were performed by `Katy Perry`: 

In [None]:
df[(df['Genre'] == 'pop') | (df['Artist.Name'] == 'Katy Perry')]

✨**Task**

Create a chained boolean mask which returns rows whose `Genre` is **pop**, *and* their `Artist.Name` is **Ed Sheeran**

In [None]:
#INSERT YOUR CODE HERE

### Groupby( )

The role of `groupby()` is anytime we want to **analyze** data by *some categories*.

Let's analyze how many songs do each Artist have in the top 100 songs!

In [None]:
artist_name_groups = df.groupby('Artist.Name')

In [None]:
artist_name_groups

- `ngroups` : gets the number of groups

In [None]:
artist_name_groups.ngroups

- `groups` :  attribute to get groups object. Those integer numbers in the list are the row number.

In [None]:
artist_name_groups.groups

- `size( )` : method to compute and display group sizes.

In [None]:
artist_name_groups.size()

Now we want to view all of `Ed Sheeran`'s songs. How, you might ask?

- `get_group( )` : method to retrieve one of the created groups

In [None]:
df_ed_sheeran = artist_name_groups.get_group('Ed Sheeran')

In [None]:
df_ed_sheeran

#### **Takeaways from this notebook**

You can now :

- Explain what Pandas is used for 
- How to create a csv file out of a DataFrame 
- How to use [] and iloc[] to access rows and columns 
- How to use Boolean masks to select the rows with your desired critera 
- How to use groupby to inspect the categories of your data.

**Resources**
- [All Pandas groupby() You Should Know for Grouping Data and Performing Operations](https://towardsdatascience.com/all-pandas-groupby-you-should-know-for-grouping-data-and-performing-operations-2a8ec1327b5)
- [Pandas' Documentation](https://pandas.pydata.org/docs/user_guide/index.html)
- [Apply and Lambda usage in pandas](https://towardsdatascience.com/apply-and-lambda-usage-in-pandas-b13a1ea037f7)