> # **Pandas Introduction**

- Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.
##  **Why Use Pandas?**
 Pandas allows us to analyze big data and make conclusions based on statistical theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

> # **Installation of Pandas**
Install it using this command:

In [1]:
pip install pandas 

Collecting pandas
  Downloading pandas-2.1.1-cp311-cp311-win_amd64.whl.metadata (18 kB)
Collecting numpy>=1.23.2 (from pandas)
  Downloading numpy-1.26.1-cp311-cp311-win_amd64.whl.metadata (61 kB)
     ---------------------------------------- 0.0/61.2 kB ? eta -:--:--
     ------ --------------------------------- 10.2/61.2 kB ? eta -:--:--
     ------------------------- ------------ 41.0/61.2 kB 653.6 kB/s eta 0:00:01
     -------------------------------------- 61.2/61.2 kB 653.3 kB/s eta 0:00:00
Collecting pytz>=2020.1 (from pandas)
  Downloading pytz-2023.3.post1-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.1 (from pandas)
  Downloading tzdata-2023.3-py2.py3-none-any.whl (341 kB)
     ---------------------------------------- 0.0/341.8 kB ? eta -:--:--
     ------- ------------------------------- 61.4/341.8 kB 1.7 MB/s eta 0:00:01
     ---------------------- --------------- 204.8/341.8 kB 2.5 MB/s eta 0:00:01
     -------------------------------------  337.9/341.8 kB 

### **Import Pandas**


In [5]:
import pandas as pd
import numpy as np

In [8]:
# object create
s = pd.Series([1,2,3,4, np.nan ,5 , 12, 15 ])   
s

0     1.0
1     2.0
2     3.0
3     4.0
4     NaN
5     5.0
6    12.0
7    15.0
dtype: float64

In [12]:
# Another object create
dates= pd.date_range("20231001", periods=23)
dates


DatetimeIndex(['2023-10-01', '2023-10-02', '2023-10-03', '2023-10-04',
               '2023-10-05', '2023-10-06', '2023-10-07', '2023-10-08',
               '2023-10-09', '2023-10-10', '2023-10-11', '2023-10-12',
               '2023-10-13', '2023-10-14', '2023-10-15', '2023-10-16',
               '2023-10-17', '2023-10-18', '2023-10-19', '2023-10-20',
               '2023-10-21', '2023-10-22', '2023-10-23'],
              dtype='datetime64[ns]', freq='D')

In [13]:
# Dates Series convert into Datafram
# Datafram (like a worksheet: Jo rows and column ki shape mein ho)
df = pd.DataFrame(np.random.randn(23, 4),index=dates, columns=list("AKIF")) 
df


Unnamed: 0,A,K,I,F
2023-10-01,-0.346512,0.933939,-0.218141,1.485174
2023-10-02,0.275349,0.569675,-0.227527,0.449468
2023-10-03,0.511094,1.590835,-1.135967,0.232729
2023-10-04,0.218746,-0.391694,-1.420178,-1.19311
2023-10-05,-1.499788,-1.385468,1.31143,-0.539214
2023-10-06,0.682058,0.425909,0.188176,-0.495463
2023-10-07,-0.244035,-0.183712,1.790918,1.312155
2023-10-08,-2.31824,0.140171,0.627447,0.07856
2023-10-09,-1.174127,0.966082,-1.334367,-1.357558
2023-10-10,1.98819,0.718297,-2.181567,-0.97764


In [17]:
df.head()  # this mean first some lines 5 or 6

Unnamed: 0,A,K,I,F
2023-10-01,-0.346512,0.933939,-0.218141,1.485174
2023-10-02,0.275349,0.569675,-0.227527,0.449468
2023-10-03,0.511094,1.590835,-1.135967,0.232729
2023-10-04,0.218746,-0.391694,-1.420178,-1.19311
2023-10-05,-1.499788,-1.385468,1.31143,-0.539214


In [19]:
df.tail(4) # this mean last some lines 5 or 6

Unnamed: 0,A,K,I,F
2023-10-20,1.013454,0.418327,0.309507,-1.306072
2023-10-21,-0.305762,0.313118,0.471534,-0.114266
2023-10-22,-0.60252,-0.679566,-1.846277,-0.207742
2023-10-23,0.024625,-0.424597,0.41555,1.478186


In [22]:
df2 = pd.DataFrame(
    {
        "A": 1.0,
        "K": pd.Timestamp("20231023"),
        "I": pd.Series(1, index=list(range(4)), dtype="float32"),
        "F": np.array([3]*4, dtype="int32"),
        "A": pd.Categorical(["girl", "woman", "boy", "man"]),
        "F": "human"

    }
)
df2

Unnamed: 0,A,K,I,F
0,girl,2023-10-23,1.0,human
1,woman,2023-10-23,1.0,human
2,boy,2023-10-23,1.0,human
3,man,2023-10-23,1.0,human


In [23]:
#  check the data categorical form
df2.dtypes

A         category
K    datetime64[s]
I          float32
F           object
dtype: object