### Tips And Tricks Of Pandas Library :

We can also learn the whole pandas library in just 10 minutes from the original pandas website or documentation :

**Pandas Website Link :**

[Pandas Library](https://pandas.pydata.org/)

**How to read the documentation :**

We can read the pandas library documentation from ***user guide***. It is available in the pandas documentation or website , it is a complete guide to understand how to use pandas library , and this trick are not only for pandas library , in fact it is also for all python famous libraries. Every python libraries have their own original website or documentation where we can learn about library. Some famous libraries names for data science are given below :

1. pandas
2. numpy
3. seaborn
4. matplotlib
5. plotly
6. scipy or scipystats
7. sklearn
8. streamlit
9. keras

**Pandas User Guider Link :**

[Pandas User Guide](https://pandas.pydata.org/docs/user_guide/index.html)

### **Some tips for pandas :**

1. **Import pandas :** After `import` the pandas library first we need to remove the mistakes in our data or clear our data.
2. **Check Data Types :** We can check the data types of our DataFrame using `.dtypes` method.
3. **Check Missing Values :** We can check missing values in our DataFrame using `.isna()` or `.isnull().sum()` methods.
4. **Handle Missing Values :** We can fill missing values using various methods like `.fillna()`, `.dropna()`, `.interpolate()`, `.ffill()`, `.bfill()`.
5. **Convert Data Types :** We can convert data types using `.astype()` method.
6. **Merge DataFrames :** We can merge two or more DataFrames using various methods like `.concat()`, `.merge()`, `.join()`.
7. **Group By :** We can group our DataFrame by a column using `.groupby()` method.
8. **Aggregation :** We can calculate the sum, mean, max, min, etc. of the 'Values' column for each group using `.groupby()` and `.agg()` methods.
9. **Sort Data :** We can sort our DataFrame by a column using `.sort_values()` method.
10. **Create New Column :** We can create a new column in our DataFrame using `.assign()` method.
11. **Series :** A `pd.Series ()` function is a one-dimensional labeled array holding data of any type such as `integers`, `strings`, `Python objects` etc.
12. **np.nan :** `np.nan` is a `float` data type which means nothing.

13. **`pd.date_range ()` :** `pd.date_range ()` function in the pandas library is used to generate a sequence of dates. It returns a fixed frequency DatetimeIndex. Syntax in the below :

**Syntax :**

`pd.date_range(start=None, end=None, periods=None, freq='D', tz=None, normalize=False, name=None, closed=None, inclusive=None)`

* `start` : `str` or `datetime-like` , optional

  * The starting date of the date range. This can be a string in a recognized date format or a datetime-like object.
* `end` : `str` or `datetime-like` , optional

  * The ending date of the date range. This can be a string in a recognized date format or a datetime-like object.
* `periods` : `int` , optional

  * The number of periods to generate. If specified, the end parameter is ignored.
* `freq` : `str` or `DateOffset` , default 'D'

  * The frequency of the date range. This can be a string representing a frequency alias (e.g., 'D' for daily, 'H' for hourly) or a DateOffset object.
* `tz` : `str` or `tzinfo` , optional

  * The time zone for the resulting DatetimeIndex. This can be a string representing the time zone name or a tzinfo object.
* `normalize` : `bool` , default False

  * If True, the start and end dates are normalized to midnight before generating the date range.
* `name` : `str` , optional

  * The name of the resulting DatetimeIndex.
* `closed` : `str` , default None

  * Make the interval closed on either side. Options are 'left' for closed on the left side, 'right' for closed on the right side, or None for neither.
* `inclusive` : `str` , default None

  * Make the interval inclusive on either side. Options are 'both' for inclusive on both sides, 'neither' for inclusive on neither side, 'left' for inclusive on the left side, or 'right' for inclusive on the right side.

**Example Usage :**
```py
import pandas as pd

# Generate a date range from 2021-01-01 to 2021-01-10
date_range = pd.date_range(start='2021-01-01', end='2021-01-10')

# Generate a date range with 10 periods starting from 2021-01-01
date_range = pd.date_range(start='2021-01-01', periods=10)

# Generate a date range with a frequency of 2 days
date_range = pd.date_range(start='2021-01-01', end='2021-01-10', freq='2D')

# Generate a date range with a specified time zone
date_range = pd.date_range(start='2021-01-01', end='2021-01-10', tz='UTC')

# Generate a date range with normalized start and end dates
date_range = pd.date_range(start='2021-01-01 10:00', end='2021-01-10 10:00', normalize=True)
```
These examples demonstrate how to use the various parameters to customize the generated date range.

14. With brackets `()` functions called `methods` and without brackets `()` functions called `attributes`.

**With Brackets `()` :**

* **Purpose :** Calls a method of the DataFrame.
* **Usage :** Methods perform operations on the DataFrame and often return a new DataFrame or other data structures.
* **Example :** `df.head ()` , `df.tail ()` , `df.info ()` , etc ...

**Without Brackets `()` :**

* **Purpose :** Accesses an attribute of the DataFrame.
* **Usage :** Attributes provide metadata or properties of the DataFrame.
* **Example :** `df.index` , `df.columns` , `df.values` , etc ...

15. In python language `axis=0` always means we deal with `rows` and `axis=1` always means we deal with `columns`.
* axis=0: Sort by the index (rows).
* axis=1: Sort by the columns.

### **Python User Guide 10 Minutes :**

We see all commands in the Python User Guide 10 minutes in the below :

#### **Commands :**

In [1]:
# 1st command.

import pandas as pd
import numpy as np

In [2]:
# 2nd command.

s = pd.Series([1, 3, 5, np.nan, 6, 8])
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

In [3]:
# 3rd command.

dates = pd.date_range("20130101", periods=6)
dates

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [4]:
# 3rd command.

df = pd.DataFrame(np.random.randn(6 , 4), index=dates, columns=list('ABCD'))
df

Unnamed: 0,A,B,C,D
2013-01-01,0.914898,0.46682,1.187844,2.094013
2013-01-02,0.037618,1.055445,0.350276,-0.898962
2013-01-03,0.429819,-0.901887,0.062352,-0.770226
2013-01-04,-1.22936,0.111778,0.844087,0.754031
2013-01-05,-0.268291,-0.963755,1.142032,-0.721736
2013-01-06,-1.055914,-0.655344,-0.746705,0.125499


In [5]:
# 4th command.

# Make data frame using dictionary.

df2 = pd.DataFrame(
    {
        "A": 1.0,
        "B": pd.Timestamp("20130102"),
        "C": pd.Series(1, index=list(range(4)), dtype="float64"),
        "D": np.array([3] * 4, dtype="int64"),
        "E": pd.Categorical(["test", "train", "test", "train"]),
        "F": "foo",
    }
)

df2

Unnamed: 0,A,B,C,D,E,F
0,1.0,2013-01-02,1.0,3,test,foo
1,1.0,2013-01-02,1.0,3,train,foo
2,1.0,2013-01-02,1.0,3,test,foo
3,1.0,2013-01-02,1.0,3,train,foo


In [6]:
# 4th command.
# It shows data type.

df2.dtypes

A          float64
B    datetime64[s]
C          float64
D            int64
E         category
F           object
dtype: object

In [7]:
# 4th command.
# It shows data info.

df2.info ()

<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, 0 to 3
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype        
---  ------  --------------  -----        
 0   A       4 non-null      float64      
 1   B       4 non-null      datetime64[s]
 2   C       4 non-null      float64      
 3   D       4 non-null      int64        
 4   E       4 non-null      category     
 5   F       4 non-null      object       
dtypes: category(1), datetime64[s](1), float64(2), int64(1), object(1)
memory usage: 320.0+ bytes


In [8]:
# 5th command.
# View data.

df.head ()

Unnamed: 0,A,B,C,D
2013-01-01,0.914898,0.46682,1.187844,2.094013
2013-01-02,0.037618,1.055445,0.350276,-0.898962
2013-01-03,0.429819,-0.901887,0.062352,-0.770226
2013-01-04,-1.22936,0.111778,0.844087,0.754031
2013-01-05,-0.268291,-0.963755,1.142032,-0.721736


In [9]:
# 5th command.

df.tail ()

Unnamed: 0,A,B,C,D
2013-01-02,0.037618,1.055445,0.350276,-0.898962
2013-01-03,0.429819,-0.901887,0.062352,-0.770226
2013-01-04,-1.22936,0.111778,0.844087,0.754031
2013-01-05,-0.268291,-0.963755,1.142032,-0.721736
2013-01-06,-1.055914,-0.655344,-0.746705,0.125499


In [10]:
# 5th command.

df.index

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [11]:
# 5th command.

df.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

In [12]:
# 5th command.
# Convert to numpy array to 2D array.

df.to_numpy ()

array([[ 0.91489782,  0.46681973,  1.18784425,  2.09401346],
       [ 0.03761776,  1.05544511,  0.35027584, -0.89896212],
       [ 0.42981924, -0.90188684,  0.06235176, -0.77022605],
       [-1.22935964,  0.11177757,  0.84408717,  0.75403096],
       [-0.26829123, -0.96375509,  1.14203202, -0.72173633],
       [-1.05591413, -0.65534404, -0.7467054 ,  0.12549887]])

In [13]:
# 5th command.

df.describe ()

Unnamed: 0,A,B,C,D
count,6.0,6.0,6.0,6.0
mean,-0.195205,-0.147824,0.473314,0.097103
std,0.835802,0.822796,0.744001,1.169181
min,-1.22936,-0.963755,-0.746705,-0.898962
25%,-0.859008,-0.840251,0.134333,-0.758104
50%,-0.115337,-0.271783,0.597182,-0.298119
75%,0.331769,0.378059,1.067546,0.596898
max,0.914898,1.055445,1.187844,2.094013


In [14]:
# 5th command.

df.T

Unnamed: 0,2013-01-01,2013-01-02,2013-01-03,2013-01-04,2013-01-05,2013-01-06
A,0.914898,0.037618,0.429819,-1.22936,-0.268291,-1.055914
B,0.46682,1.055445,-0.901887,0.111778,-0.963755,-0.655344
C,1.187844,0.350276,0.062352,0.844087,1.142032,-0.746705
D,2.094013,-0.898962,-0.770226,0.754031,-0.721736,0.125499


In [15]:
# 5th command.
# axis=0 : Sort by the index (rows).
# axis=1 : Sort by the columns.

df.sort_index (axis=1 , ascending=False)

Unnamed: 0,D,C,B,A
2013-01-01,2.094013,1.187844,0.46682,0.914898
2013-01-02,-0.898962,0.350276,1.055445,0.037618
2013-01-03,-0.770226,0.062352,-0.901887,0.429819
2013-01-04,0.754031,0.844087,0.111778,-1.22936
2013-01-05,-0.721736,1.142032,-0.963755,-0.268291
2013-01-06,0.125499,-0.746705,-0.655344,-1.055914


In [16]:
# 5th command.
# Sort by the values.
# you can see the output it is impossible to sort tow or more values at a time.

df.sort_values (by=['A' , 'B']) # For more information see the pandas documentation.

Unnamed: 0,A,B,C,D
2013-01-04,-1.22936,0.111778,0.844087,0.754031
2013-01-06,-1.055914,-0.655344,-0.746705,0.125499
2013-01-05,-0.268291,-0.963755,1.142032,-0.721736
2013-01-02,0.037618,1.055445,0.350276,-0.898962
2013-01-03,0.429819,-0.901887,0.062352,-0.770226
2013-01-01,0.914898,0.46682,1.187844,2.094013


In [17]:
# Show Whole Data Frame.

df

Unnamed: 0,A,B,C,D
2013-01-01,0.914898,0.46682,1.187844,2.094013
2013-01-02,0.037618,1.055445,0.350276,-0.898962
2013-01-03,0.429819,-0.901887,0.062352,-0.770226
2013-01-04,-1.22936,0.111778,0.844087,0.754031
2013-01-05,-0.268291,-0.963755,1.142032,-0.721736
2013-01-06,-1.055914,-0.655344,-0.746705,0.125499


In [18]:
# 6th command.
# Getting or Getitem ([]) (M.Important).

df [['A' , 'B']]

Unnamed: 0,A,B
2013-01-01,0.914898,0.46682
2013-01-02,0.037618,1.055445
2013-01-03,0.429819,-0.901887
2013-01-04,-1.22936,0.111778
2013-01-05,-0.268291,-0.963755
2013-01-06,-1.055914,-0.655344


In [19]:
# 6th command.
# index (M.Important).

df[0 : 4] # But it selects only rows.

Unnamed: 0,A,B,C,D
2013-01-01,0.914898,0.46682,1.187844,2.094013
2013-01-02,0.037618,1.055445,0.350276,-0.898962
2013-01-03,0.429819,-0.901887,0.062352,-0.770226
2013-01-04,-1.22936,0.111778,0.844087,0.754031


In [20]:
# 6th command.

df.iloc [0 : 4 , 0 : 3] # It selects both rows and columns using indexes.

Unnamed: 0,A,B,C
2013-01-01,0.914898,0.46682,1.187844
2013-01-02,0.037618,1.055445,0.350276
2013-01-03,0.429819,-0.901887,0.062352
2013-01-04,-1.22936,0.111778,0.844087


In [21]:
# 7th command.
# Selection by position or label.
# It is also called data selection or data slice.

df.loc [: , ['A' , 'B']]

Unnamed: 0,A,B
2013-01-01,0.914898,0.46682
2013-01-02,0.037618,1.055445
2013-01-03,0.429819,-0.901887
2013-01-04,-1.22936,0.111778
2013-01-05,-0.268291,-0.963755
2013-01-06,-1.055914,-0.655344


In [22]:
# 7th command.

df.iloc [3]

A   -1.229360
B    0.111778
C    0.844087
D    0.754031
Name: 2013-01-04 00:00:00, dtype: float64

In [23]:
# 7th command.

df.loc [: , :]

Unnamed: 0,A,B,C,D
2013-01-01,0.914898,0.46682,1.187844,2.094013
2013-01-02,0.037618,1.055445,0.350276,-0.898962
2013-01-03,0.429819,-0.901887,0.062352,-0.770226
2013-01-04,-1.22936,0.111778,0.844087,0.754031
2013-01-05,-0.268291,-0.963755,1.142032,-0.721736
2013-01-06,-1.055914,-0.655344,-0.746705,0.125499


In [24]:
# 7th command.
# Let's work on big datasets.

import seaborn as sns

ship = sns.load_dataset ('titanic')
ship

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True
888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True


In [25]:
# 7th command.

ship.sample (100)

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
755,1,2,male,0.67,1,1,14.5000,S,Second,child,False,,Southampton,yes,False
484,1,1,male,25.00,1,0,91.0792,C,First,man,True,B,Cherbourg,yes,False
515,0,1,male,47.00,0,0,34.0208,S,First,man,True,D,Southampton,no,True
592,0,3,male,47.00,0,0,7.2500,S,Third,man,True,,Southampton,no,True
128,1,3,female,,1,1,22.3583,C,Third,woman,False,F,Cherbourg,yes,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
707,1,1,male,42.00,0,0,26.2875,S,First,man,True,E,Southampton,yes,True
851,0,3,male,74.00,0,0,7.7750,S,Third,man,True,,Southampton,no,True
647,1,1,male,56.00,0,0,35.5000,C,First,man,True,A,Cherbourg,yes,True
19,1,3,female,,0,0,7.2250,C,Third,woman,False,,Cherbourg,yes,True


In [26]:
# 8th command.
# Boolean indexing.

ship[ship.age < 5] # It is also called data filtering.

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
7,0,3,male,2.0,3,1,21.075,S,Third,child,False,,Southampton,no,False
10,1,3,female,4.0,1,1,16.7,S,Third,child,False,G,Southampton,yes,False
16,0,3,male,2.0,4,1,29.125,Q,Third,child,False,,Queenstown,no,False
43,1,2,female,3.0,1,2,41.5792,C,Second,child,False,,Cherbourg,yes,False
63,0,3,male,4.0,3,2,27.9,S,Third,child,False,,Southampton,no,False
78,1,2,male,0.83,0,2,29.0,S,Second,child,False,,Southampton,yes,False
119,0,3,female,2.0,4,2,31.275,S,Third,child,False,,Southampton,no,False
164,0,3,male,1.0,4,1,39.6875,S,Third,child,False,,Southampton,no,False
171,0,3,male,4.0,4,1,29.125,Q,Third,child,False,,Queenstown,no,False
172,1,3,female,1.0,1,1,11.1333,S,Third,child,False,,Southampton,yes,False


In [27]:
# 8th command.

ship[ship.fare == 0] [['survived' , 'age' , 'fare']]

Unnamed: 0,survived,age,fare
179,0,36.0,0.0
263,0,40.0,0.0
271,1,25.0,0.0
277,0,,0.0
302,0,19.0,0.0
413,0,,0.0
466,0,,0.0
481,0,,0.0
597,0,49.0,0.0
633,0,,0.0


In [28]:
# 8th command.

ship[ship['fare'] < 5]

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
179,0,3,male,36.0,0,0,0.0,S,Third,man,True,,Southampton,no,True
263,0,1,male,40.0,0,0,0.0,S,First,man,True,B,Southampton,no,True
271,1,3,male,25.0,0,0,0.0,S,Third,man,True,,Southampton,yes,True
277,0,2,male,,0,0,0.0,S,Second,man,True,,Southampton,no,True
302,0,3,male,19.0,0,0,0.0,S,Third,man,True,,Southampton,no,True
378,0,3,male,20.0,0,0,4.0125,C,Third,man,True,,Cherbourg,no,True
413,0,2,male,,0,0,0.0,S,Second,man,True,,Southampton,no,True
466,0,2,male,,0,0,0.0,S,Second,man,True,,Southampton,no,True
481,0,2,male,,0,0,0.0,S,Second,man,True,,Southampton,no,True
597,0,3,male,49.0,0,0,0.0,S,Third,man,True,,Southampton,no,True


## ***The End.***