#                                   pandas

Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

## Why Use Pandas?
Pandas allows us to analyze big data and make conclusions based on statistical theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

In this notebook i will try to explain at least five pandas techniques with coding examples

* Boolean Indexing

Filtering data from a dataset is one of the most common and basic operations. There are numerous ways to filter (or subset) data in pandas with boolean indexing. Boolean indexing (also known as boolean selection) can be a confusing term, but for the purposes of pandas, it refers to selecting rows by providing a boolean value (True or False) for each row. These boolean values are usually stored in a Series or NumPy ndarray and are usually created by applying a boolean condition to one or more columns in a DataFrame.

In [27]:
import pandas as pd
data = pd.DataFrame({'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Age': [20, 21, 19, 18]})
data

Unnamed: 0,Name,Age
0,Tom,20
1,Joseph,21
2,Krish,19
3,John,18


In [28]:
bool_serie = data['Age'] < 20
bool_serie

0    False
1    False
2     True
3     True
Name: Age, dtype: bool

In [29]:
data_filtered = data[bool_serie]
data_filtered

Unnamed: 0,Name,Age
2,Krish,19
3,John,18


* merging dataframes

In life, data is provided is present in multiple files, with some of the columns present in more than one files. if you are familiar with databases and sql language, you will definitely know what I mean, sometimes you need to join two tables in one table to get specific data, the 'join' word for sql in 'merge' in pandas 


In [30]:
data_city = pd.DataFrame({'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'city': ['tunis', 'bizert', 'beja', 'tunis']})
data_city

Unnamed: 0,Name,city
0,Tom,tunis
1,Joseph,bizert
2,Krish,beja
3,John,tunis


In [31]:
df_merged = pd.merge(data,data_city,on='Name')
df_merged

Unnamed: 0,Name,Age,city
0,Tom,20,tunis
1,Joseph,21,bizert
2,Krish,19,beja
3,John,18,tunis


It might happen that the column on which you want to merge the DataFrames have different names (unlike in this case). For such merges, you will have to specify the arguments left_on as the left DataFrame name and right_on as the right DataFrame name, like : df_merged = pd.merge(data,data_city,left_on='Name1',right_on='name2')


* dataframe chaining

Method chaining is a programmatic style of invoking multiple method calls sequentially with each call performing an action on the same object and returning it, Method chaining substantially increases the readability of the code.


In [32]:
data_chained = pd.merge(data,data_city,on='Name').groupby('city').mean()
data_chained

Unnamed: 0_level_0,Age
city,Unnamed: 1_level_1
beja,19
bizert,21
tunis,19


* creating new dataframe

During a data analysis, it is extremely likely that you will need to create new columns to represent new variables. Commonly, these new columns will be created from previous columns already in the dataset.

The simplest way to create a new column is to assign it a scalar value. Place the name of the new column as a string into the indexing operator. Let's create the year of birth column 

In [33]:
df_merged['year_birth2'] = 2021 - df_merged['Age']
df_merged

Unnamed: 0,Name,Age,city,year_birth2
0,Tom,20,tunis,2001
1,Joseph,21,bizert,2000
2,Krish,19,beja,2002
3,John,18,tunis,2003


* Selecting DataFrame columns with filter 

An alternative method to select columns is with the filter method. This method is flexible and searches column names (or index labels) based on which parameter is used. Here, we use the like parameter to search for all column names that contain the exact string 'Age'



In [34]:
df_merged.filter(like='Age')

Unnamed: 0,Age
0,20
1,21
2,19
3,18


The filter method allows columns to be searched through regular expressions with the regex parameter. Here, we search for all columns that have a digit somewhere in their name:

In [35]:
df_merged.filter(regex='/d')

0
1
2
3
