# PANDAS

__Pandas__ is a powerful _open-source_ data manipulation and analysis library for _Python_. 

It provides data structures and functions for efficiently __handling and analyzing structured data__, such as tables or spreadsheets.

With __pandas__, you can easily _load_, _manipulate_, _analyze data_, perform _data cleaning_ and _preprocessing_ tasks, and create _visualizations_.

It is widely used in _data science_, _machine learning_, and _data analysis_ projects.

To import the pandas library and assigns it the alias 'pd', you could make `import pandas as pd`.

## The Series Data Structure

A __pandas Series__ is a _one-dimensional labeled array_ capable of holding any data type. It is similar to a _column_ in a spreadsheet or a SQL table, or a _dictionary-like_ object. It is a fundamental _data structure_ in __pandas__ library, which is widely used for data manipulation and analysis in Python.

A __pandas Series__ consists of two main components: the _data_ and the _index_. The _data_ can be of any type, such as integers, floats, strings, or even complex objects. The _index_ is a sequence of labels that uniquely identifies each element in the Series.

Some key features of pandas Series include:
- Vectorized operations: Series supports vectorized operations, allowing you to perform element-wise computations efficiently.
- Label-based indexing: You can access elements in a Series using labels instead of integer-based indexing.
- Alignment: Series automatically aligns data based on the index, making it easy to perform operations on multiple Series with different indexes.

To create a __Series__, you can pass a list, array, or dictionary-like object to the `pd.Series()` constructor. You can also specify custom index labels if needed.

In [9]:
import pandas as pd

# Create a Series object from a list of strings
list_elements = ['a', 'b', 'c', 'd', 'e']
serie_1 = pd.Series(list_elements)
print("Serie 1:", type(serie_1), "\n", serie_1)

# Create a Series object from a list of numbers
list_numbers = [1, 2, 3, 4, 5]
serie_2 = pd.Series(list_numbers)
print("Serie 2:", type(serie_2), "\n", serie_2)

# Create a Series object from a list of numbers with a None value
list_numbers_with_none = [1, 2, None, 4, 5]
serie_3 = pd.Series(list_numbers_with_none)
print("Serie 3:", type(serie_3), "\n", serie_3)

Serie 1: <class 'pandas.core.series.Series'> 
 0    a
1    b
2    c
3    d
4    e
dtype: object
Serie 2: <class 'pandas.core.series.Series'> 
 0    1
1    2
2    3
3    4
4    5
dtype: int64
Serie 3: <class 'pandas.core.series.Series'> 
 0    1.0
1    2.0
2    NaN
3    4.0
4    5.0
dtype: float64


In [11]:
# Create a Series object from a dictionary
dict_data = {'a': 1, 'b':2, 'c': 3, 'd': 4}
serie_4 = pd.Series(dict_data)
print("Serie 4:", type(serie_4), "\n", serie_4)

# Get the values of the Series index
print("Serie 4 index:", serie_4.index)

Serie 4: <class 'pandas.core.series.Series'> 
 a    1
b    2
c    3
d    4
dtype: int64
Serie 4 index: Index(['a', 'b', 'c', 'd'], dtype='object')


In [None]:
# Create a series object from a list of tuple pairs


In [None]:
# Create a series object from a list as values and a list as index


In [None]:
# Query a Series object by boolean indexing


In [None]:
# Query a Series object by faccy indexing


In [None]:
# Query a Series object using loc[]

In [None]:
# Query a Series object using iloc[]

## The DataFrame Data Structure

A __pandas DataFrame__ is a _two-dimensional_, _labeled_ data structure in _Python_ that is commonly used for _data manipulation and analysis_. It consists of _rows_ and _columns_, similar to a table in a relational database.

The __DataFrame__ can store _heterogeneous data types_ and provides various operations and functions to perform data manipulation, filtering, grouping, and statistical analysis.

To access and manipulate the data in the __DataFrame__, you can use various _methods_ and _attributes_ provided by the __pandas__ library.

For more information on __pandas DataFrame__, refer to the [official pandas documentation](https://pandas.pydata.org/docs/reference/frame.html).

In [None]:
# create dataframes from lists



In [None]:
# create a dataframe from a list of dictionaries


In [None]:
# create a dataframe from a csv file


In [None]:
# create a dataframe from a json file


In [None]:
# describe a dataframe


In [None]:
# get information about a dataframe


In [None]:
# indexes and columns


### Using Datetime into Pandas

In [None]:
# converting a column to datetime with to_datetime()


In [None]:
# converting a column from datetime to string with strftime()


In [None]:
# converting a column from datetime to a timestamp with timestamp()

### Queries and Transformations

In [None]:
# query a dataframe by column


In [None]:
# query a dataframe by row with loc


In [None]:
# query a dataframe by row with iloc


In [None]:
# query a dataframe using a boolean mask


In [None]:
# query a dataframe using query()


In [None]:
# get missing values using isnull()


In [None]:
# fill missing values using fillna()


In [None]:
# drop missing values using dropna()


In [None]:
# transform column using to_datetime()


In [None]:
# transform column using to_numeric()


In [None]:
# convert column to category using astype()


In [None]:
# merge dataframes using merge()


In [None]:
# concatenate dataframes using concat()


In [None]:
# join dataframes using join()


In [None]:
# group dataframes using groupby()


In [None]:
# group and aggregate dataframes using groupby() and aggregate()


In [None]:
# group and transform dataframes using groupby() and transform()


In [None]:
# group and filter dataframes using groupby() and filter()


In [None]:
# merge dataframes using pivot()


In [None]:
# pivot dataframes using pivot_table()


### Advanced Transformations

In [None]:
# making transformations using apply()


In [None]:
# making transformations using chain transformations


### Statistical Testing

In [None]:
# making a t-test with pandas and scipy


In [None]:
# making an ANOVA test with pandas and scipy


In [None]:
# making a chi-square test with pandas and scipy


In [None]:
# making a correlation validation with pandas 


In [None]:
# p-hacking example


In [None]:
# p-hacking example with multiple testing


In [None]:
# p-value example


In [None]:
# p-value correction with Bonferroni
