# Pandas


`pandas` is a popular open-source data analysis and manipulation library for the Python programming language. It is built on top of the NumPy library and provides easy-to-use data structures and data analysis tools for efficient data manipulation and analysis.

In this tutorial, we will cover the basic functionality of the pandas package including data structures, data selection and manipulation, and data analysis tools.


## Installation


First, you need to install the `pandas` package in your Python environment. You can install `pandas` using the following command in your terminal or command prompt:

```bash
pip install pandas
```


## Importing the pandas Package


To import the `pandas` package in your current Python session, use the following command:


In [None]:
import pandas as pd


Here, we have imported the `pandas` package and assigned it the alias `pd`. This is a common practice in the Python community. You can use any alias you want, but it is recommended to use the alias `pd` for the `pandas` package. This is the standard convention and will be used in remainder of this tutorial.


## Data Structures


`pandas` provides two primary data structures: `Series` and `DataFrame`. A `Series` is a one-dimensional labeled array that can hold any data type, while a `DataFrame` is a two-dimensional labeled data structure with columns of potentially different data types.


### Series


A `Series` can be created using the `pd.Series()` function. The following example creates a `Series` object from a list:


In [None]:
import pandas as pd

data = [1, 2, 3, 4, 5]
s = pd.Series(data)

print(s)


In the above code, we first imported pandas and created a list `data` containing some numbers. Then, we created a Series `s` using the `pd.Series()` function and passed the data list to it. Finally, we printed the `Series` to the console.


### DataFrame


A DataFrame can be created using the `pd.DataFrame()` function. The following example creates a DataFrame object from a dictionary:


In [None]:
import pandas as pd

data = {
    "name": ["John", "Charlotte", "David", "Alice"],
    "age": [20, 21, 19, 18],
    "country": ["USA", "UK", "Canada", "France"],
}

df = pd.DataFrame(data)

print(df)


In the above code, we created a dictionary `data` containing three keys `"name"`, `"age"`, and `"country"` and their respective values. Then, we created a `DataFrame` df using the `pd.DataFrame()` function and passed the `data` dictionary to it. Finally, we printed the `DataFrame` to the console.


## Data Selection and Manipulation


Pandas provides powerful tools for data selection a manipulation. In this section, we will cover some of the most commonly used tools. For a complete list of tools, refer to the official pandas documentation.


### Selecting Columns


You can select one or more columns from a DataFrame by using square selection brackets `[]` with the column name(s). For example:


In [None]:
import pandas as pd

data = {
    "name": ["John", "Charlotte", "David", "Alice"],
    "age": [20, 21, 19, 18],
    "country": ["USA", "UK", "Canada", "France"],
}

df = pd.DataFrame(data)


In [None]:
# Selecting a single column
print(df["name"])


In [None]:
# Selecting multiple columns
print(df[["name", "age"]])


Note the usage of double square brackets `[[...]]` to select multiple columns. This is because we are inputting a list of column names inside the selection brackets `[]`. Therefore, if you use a single square bracket to pass multiple column names `[...]`, you will get an error. Notice that the output of `df[...]` is a `Series` object, while the output of `df[[...]]` is a `DataFrame` object.


You can alternatively use a more object oriented method to select columns using dot (`.`) notation. For example:


In [None]:
import pandas as pd

data = {
    "name": ["John", "Charlotte", "David", "Alice"],
    "age": [20, 21, 19, 18],
    "country": ["USA", "UK", "Canada", "France"],
}

df = pd.DataFrame(data)

# Selecting a single column
print(df.name)


However, this method has caveats. For example, it can cause problems when column names contain spaces or special characters. For example, if you have a column named `"first name"`, you cannot use `.` notation to select it. You will have to use selection brackets `[]` instead. Also note that `.` notation is not designed for selecting multiple columns. Selecting columns through `.` notation is arguably less readable and powerful than using selection brackets `[]`. Therefore, it is recommended for most use cases to use selection brackets `[]` for selecting columns.


## Further Reading


Check out the following resources for more information on the `pandas` library:

- [Official pandas documentation](https://pandas.pydata.org/pandas-docs/stable/index.html)
- [pandas Cheat Sheet](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)
- [pandas Cookbook](https://pandas.pydata.org/pandas-docs/stable/user_guide/cookbook.html)

Tutorials:

- [W3Schools](https://www.w3schools.com/python/pandas/default.asp)
- [DataCamp](https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python)
- [Real Python](https://realpython.com/pandas-python-explore-dataset/)
- [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/03.00-introduction-to-pandas.html)
- [kaggle](https://www.kaggle.com/learn/pandas)
