## Introduction to pandas
In this notebook, we will learn the basics of the `pandas` library. This is a powerful package that is often used to work with data in Python. It alllows you to look at your data, perform manipulations, and create visualizations. 

There is a lot to learn about `pandas`, but we will cover the basics in this notebook. Check out the [official documentation](https://pandas.pydata.org/pandas-docs/stable/reference/io.html) for more information about the available functions and methods.

This lesson covers the following topics:
- Reading data as a DataFrame
- Handling missing data
- Basic operations on a DataFrame: 
    - Filtering data
    - Grouping data
    - Sorting data
- Writing data to a file 


First, you have to import the `pandas` package. You can do this by running the following code:

In [2]:
import pandas as pd

`pandas` works with a structure called a `DataFrame`. This is table with rows and columns. You can think of it as a spreadsheet. 

#### Loading and inspecting data

Let's start by loading a CSV file and saving it to a variable. You can do this by using the `read_csv` function. This function takes the path to the file as an `argument`: 

In [27]:
df = pd.read_csv('songs.csv')

The `head()` function allows you to see the first 10 rows of the DataFrame. This is useful to get a quick overview of the data.

In [28]:
df.head(5)

Unnamed: 0,song,artist,length,language
0,BIRDS OF A FEATHER,Billie Eilish,3:30,English
1,Espresso,Sabrina Carpenter,2:55,English
2,Not Like Us,Kendrick Lamar,4:34,English
3,Si Antes Te Hubiera Conocido,KAROL G,3:16,Spanish
4,Ik Wil Dansen,Froukje,3:14,Dutch


You can also use the `tail()` function to see the last rows of the DataFrame. You can specify the number of rows you want to see by passing an argument to the function.

In [29]:
df.tail(5)

Unnamed: 0,song,artist,length,language
5,Too Sweet,Hozier,4:11,English
6,Europapa,Joost,2:40,Dutch
7,Who,Jimin,,English
8,Supernova,aespa,2:59,Korean
9,LUNCH,Billie Eilish,3:00,English


Get a summary of the DataFrame with the `info()` function. This will show you the number of rows, the number of columns, the data types of the columns, and the number of non-null values in each column.

In [30]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   song       10 non-null     object
 1    artist    10 non-null     object
 2    length    9 non-null      object
 3    language  10 non-null     object
dtypes: object(4)
memory usage: 448.0+ bytes


#### Null values

You may have noticed that the `info()` function shows the number of `non-null` values in each column. A `null` value is a value that is missing from the data. This can happen for a variety of reasons, such as data entry errors, or missing data. 

Null values can cause problems when you are working with data, so it's important to know how to handle them. The easiest way to deal with null values is to remove them from the data. You can do this by using the `dropna()` function. This function will remove any rows that contain null values. 

In [31]:
df = df.dropna()

In [35]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 9 entries, 0 to 9
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   song       9 non-null      object
 1    artist    9 non-null      object
 2    length    9 non-null      object
 3    language  9 non-null      object
dtypes: object(4)
memory usage: 360.0+ bytes
