## Pandas  
***Pandas is a library for woking with data. It's two most well known structures are:***
-**`Series` and `DataFrames`**
It can be a potential substitute for Excel.

### Series
`Series` are data structures very similar to one-dimensional arrays.
**Format**: 
`Series[(data,index,dtype,names,copy,...)]`

#### Syntax 
```python
import pandas as pd
a = [i for i in range(10)]
a_series = pd.series(a)
```
#### Some series attributes
 Attribute | Return | Description
 --------- | ------ | -----------
 Series.index | Range Index | Returns an iterable with series indices
 Series.dtype | dtype('Object') | Returns the type of the data
 Series.size | int | Returns the number of elements
 Series.name | str | Returns the name of the series

**Note**: Many attributes are common  between `Series` and `DataFrames`. 

##### Example 
```python 
import pandas as pd 
numbers = [i + 10 for i in range (10)]
index = list('abcdefghij')
numbers_s = pd.series(numbers, index=index, name = 'numbers')
```
### DataFrame
***These are data structures in table format possessing various functionalities similar to Excel***
#### Format 
`DataFrame([data, index, columns, dtype, copy,])`
#### Syntax
```python
import pandas as pd
a = [i for i in range (10)]
b = [i for i in range (10)]
data = {'Col A' : a ,
        'Col B' : b
        }
data_df = pd.DataFrame(data)
```
#### Some DataFrame attributes

 Attribute | Return | Description
 --------- | ------ | -----------
 DataFrame.index | Range index | returns an iterable with the indices of the series
 DataFrame.columns | Index | returns an iterable with the columns names
 DataFrame.dtypes | Series | returns a serie with data types
 DataFrame.values | ndarray | returns a numpy array with values
 DataFrame.size | int | returns number of values in the DataFrame
 DataFrame.shape | tuple | returns a tuoke with number of rows and columns in the DataFrame 

##### Column Selection: 
`DataFrame[ColumnName]` returns a series with the selected columns. 
`DataFrame[[ColumnNameA, ColumnNameB,...]]` returns a DataFrame with the selected columns.
`DataFrame[newName] = newData` adds a series with the selected column.

##### Examples
```python
import pandas as pd
import numpy as np
a = list('abcdefghijkl')
b = np.linspace(0,20,12)
data = {('Char': a,
        'numbers' : b)}

data_df = pd.DataFrame(data)
c = pd.Series(range(12), name = 'Series Example b')
data_df[c.name] = c
```

#### Basic Methods for DataFrame
 Method | Description
 ------ | -----------
 DataFrame.head([n]) | returns the first n rows of the DataFrame. Default is 10
 DataFrame.tail([n]) | returns the last n rows of the DataFrame. Default is 10 
 DataFrame.min([axis]) | Default is `axis = 0`, 0 is for columns, 1 is for rows
 DataFrame.max([axis]) | Default is `axis = 0`, 0 is for columns, 1 is for rows
 DataFrame.cumsum([axis]) | Default is `axis = 0`, 0 is for columns, 1 is for rows
 DataFrame.value_counts() | returns a series with the count of how much did an element appear in the DataFrame
 DataFrame.sort_values(by) | sorts the DataFrame according to the argument `by= 'ColumnName'`

### Reading and writing files
**The function for reading CSV file**:
`pandas.read_csv(FilePath, sep= NoDefault.no_default, encoding = None,....)`

**Methods for writing CSV and Excel files**:
`DataFrame.to_excel(path, sheet_name= 'sheet1',...)`
`DataFrame.to_csv(path, sep='',...)`

### Selecting rows and columns with loc and iloc
**`loc` and `iloc` are properties of DataFrames. They are properties that allow us to access rows and columns similar to slicing lists**.
#### Properties of `loc` and `iloc`

 Property | Description
 -------- | -----------
 loc[i,j] | where i and j are the names of the selected indices
 iloc[i,j] | where i and j are the numbers of the selected indices 

##### Example
```python
import pandas as pd
import numpy as np
File = '...\path'
df = pd.read_csf(File)
df.index = np.arange(1000) + 10.1
df.head([10])
col = ['Energy Source', 'Location']
df.loc[10.1 : 14.1, 'Energy Source', 'Location']
df.iloc[0 : 4, 0 : 5]
```
### Filters 
**DataFrame filters can be created from boolean series**
#### Syntax
`colfilter = DataFrame['Column_to_filter'] =='Filter_Value' `  
`DataFrame[colfilter]`
##### Example demonstration 
```python
import pandas as pd
import numpy as np
file = 'path'
colfilter = df['Capacity (MW)'] > 6000
df[colfilter]
renewables_filter = df['Renewables'] == 'Yes'
df_renewables_filter = df[renewables_filter]
```

### Data Cleaning and preprocessing 
**Often times, before we start our analyses we need to clean and treat the data in a DataFrame.**
#### Methods for data DataFrame processing

 Method | Description
 ------ | -----------
 DataFrame.drop(lable=None,axis=0,...) | removes a specific series
 DataFrame.isnull(obj)/ notnull(obj) | Create a boolean series
 DataFrame.dropna() | Deletes row or columns with null cells
 DataFrame.fillna() | Replaces null values with a determined value
 DataFrame.duplicated() | returns a boolean with duplicated values
 DataFrame.drop_duplicated() | Deletes rows with duplicated values, you can select columns using `subset`

**Note**: These methods do not modify the data frame, so it is useful to store them in a new variable.

##### Example
```python
import pandas as pd
file = 'path'
df = pd.read_csv(file)
df.head([10]).drop(['Yield (%)','Reaction Time (min)'], axis =0)
df.head([10]).notnull()
df.head([10]).fillna(999)
```

### Joining tables
***Join Method***: This method joins columns from DataFrames **On their indices**. It's important to understand how to use it carefully. as it's a powerful tool for merging infomation from databases.

#### Format
`DataFrame.join(other, on = None, how = 'left', lsuffix = '', rsuffix = '', sort = False)`

*Observations* :
- `Other` : Another object (DataFrame, series or a list of DataFrames)
- `How` : `left` = index of DataFrame, `right` = index of *other*, `outer` : union, `inner` : intersection
- `lsuffix`, `rsuffix` : when there are duplicate column names, we modify the suffix of one of them.

##### Example 
```python
import pandas as pd
df1 = pd.DataFrame({'Employee_id' : [1,2,3,4]
                    'Employee_Name': ['Alice', 'Bob', 'Charlie', 'David']})
df2= pd.DataFrame({'Employee_id': [3,4,5,6]
                    'Department': ['HR', 'IT', 'Finance', 'Marketing']}
                    'Employee_Name': ['Charlie', 'David', 'Rafael', 'Lucas'])
df1 = set_index('Employee_id', inplace = True)
df2 = set_index('Employee_id', inplace = True) # Setting the Employee id as the joining index
df3 = df1.joing(df2, how = 'outer', rsuffix = 'b')
```

### Concat Function
The concat function is used to concatenate DataFrames, whether it's adding them in terms of rows (`axis = 0`) or columns (`axis = 1`).
#### Format
`pd.concat(objs, axis = 0, join = 'outer', ignore_index = False , key = None, levels = None, name = None, verify_integrity = False, sort = False, copy = False )`

**Observations** : 
- `objs`: it's a single argument, so it can be a list with DataFrames.
- `ignore_index`: renumerate the indices.
- `join`, `outer` : union, `inner`: intersection.

##### Example
```python
import pandas as pd
df1 = pd.DataFrame({'Employee_id' : [1,2,3,4]
                    'Employee_Name': ['Alice', 'Bob', 'Charlie', 'David']})
df2= pd.DataFrame({'Employee_id': [3,4,5,6]
                    'Department': ['HR', 'IT', 'Finance', 'Marketing']}
                    'Employee_Name': ['Charlie', 'David', 'Rafael', 'Lucas'])
df3 = pd.DataFrame({'Employee_id': [2,3,4,7],
                    'Salary': [70000, 80000, 90000, 100000]})
df1 = set_index('Employee_id', inplace = True)
df2 = set_index('Employee_id', inplace = True) # Setting the Employee id as the joining index
df3 = set_index('Employee_id', inplace = True)
frames = [df1, df2, df3]
dfs_concated = pd.concat(frames, axis = 1, join = 'inner', ignore_index = False)
```
### Pivot Tables
A pivot table is a tool that allows us to make different groupings of our information.

#### Syntax
`pd.pivot_table(data, values = None, index = None, columns = None, aggfunc = 'mean', fill_value = None, margins = False, dropna = True, margins_name= 'All', observed = False, sort = True)`

**Notes** : 
`aggfunc` can be `mean`,`sum`,`min`,`max`

##### Usage Example 
`pd.pivot_table(df, index = 'Energy Source', aggfunc = 'mean', values = 'Efficiency (%)', columns = 'Location')`




