##  📚 Essential Basic Functionality

> Pandas is a foundational library in Python for working with structured data. It provides fast, flexible, and expressive tools designed to make data analysis and manipulation easy and intuitive. There are several essential functionalities that are fundamental to using Pandas effectively.

In [1]:
import pandas as pd
import numpy as np

### Head and tail

- Head shows the first 5 rows by default.
- Tail shows the last 5 rows by default.

In [2]:
long_series = pd.Series(np.random.randn(10))
long_series

0    0.973173
1    0.717283
2   -1.060321
3   -1.682200
4    1.134680
5   -0.467489
6    0.217820
7   -0.381274
8    1.022441
9   -0.664676
dtype: float64

In [3]:
# Head
long_series.head()

0    0.973173
1    0.717283
2   -1.060321
3   -1.682200
4    1.134680
dtype: float64

In [4]:
# Tail
long_series.tail()

5   -0.467489
6    0.217820
7   -0.381274
8    1.022441
9   -0.664676
dtype: float64

### Attributes and underlying data

- shape: gives the axis dimensions of the object, consistent with ndarray
- Axis labels:
    - Series: index (only axis)
    - DataFrame: index and columns


In [5]:
index = pd.date_range('20230101', periods=10)
df = pd.DataFrame(np.random.randn(10, 4), index=index, columns=list ('ABCD'))
df

Unnamed: 0,A,B,C,D
2023-01-01,-0.383576,-0.879254,0.63909,0.25535
2023-01-02,-0.068703,1.10794,1.874308,2.248359
2023-01-03,0.663555,1.769808,1.814589,-1.294594
2023-01-04,0.051002,-0.901208,2.426192,-0.075746
2023-01-05,-0.683598,-0.155574,-0.520885,-1.261785
2023-01-06,-0.078052,-0.504483,-0.277675,0.719818
2023-01-07,-1.085607,1.058821,-1.256169,0.634839
2023-01-08,0.327605,0.861452,-0.957489,1.690867
2023-01-09,0.716459,0.773287,-0.815275,1.528035
2023-01-10,-1.124508,-1.294064,0.250698,0.209566


In [6]:
df.columns = [x.lower() for x in df.columns]
df

Unnamed: 0,a,b,c,d
2023-01-01,-0.383576,-0.879254,0.63909,0.25535
2023-01-02,-0.068703,1.10794,1.874308,2.248359
2023-01-03,0.663555,1.769808,1.814589,-1.294594
2023-01-04,0.051002,-0.901208,2.426192,-0.075746
2023-01-05,-0.683598,-0.155574,-0.520885,-1.261785
2023-01-06,-0.078052,-0.504483,-0.277675,0.719818
2023-01-07,-1.085607,1.058821,-1.256169,0.634839
2023-01-08,0.327605,0.861452,-0.957489,1.690867
2023-01-09,0.716459,0.773287,-0.815275,1.528035
2023-01-10,-1.124508,-1.294064,0.250698,0.209566


#### Numpy

- It is a reliable and consistent method to convert pandas objects to NumPy arrays, offering better control over data types and compatibility with extension types compared other methods.

In [7]:
df_numpy = df.to_numpy()
df_numpy

array([[-0.38357604, -0.87925397,  0.63908979,  0.25535014],
       [-0.06870283,  1.10793983,  1.87430836,  2.24835927],
       [ 0.66355541,  1.76980776,  1.81458866, -1.29459416],
       [ 0.05100165, -0.90120753,  2.42619236, -0.07574637],
       [-0.68359847, -0.15557431, -0.52088535, -1.26178462],
       [-0.07805197, -0.50448271, -0.27767525,  0.71981847],
       [-1.08560696,  1.0588213 , -1.25616859,  0.63483914],
       [ 0.32760532,  0.86145232, -0.95748874,  1.69086688],
       [ 0.71645923,  0.77328713, -0.81527474,  1.52803502],
       [-1.12450828, -1.29406448,  0.2506981 ,  0.20956627]])

## Matching / broadcasting behavior

In [8]:
df_mathing_broadcasting = pd.DataFrame(
    {
        "one": pd.Series(np.random.randint(0, 10, 3), index=["a", "b", "c"]),
        "two": pd.Series(np.random.randint(0, 10, 3), index=["a", "b", "c"]),
        "three": pd.Series(np.random.randint(0, 10, 4), index=["a", "b", "c", "d"]),

    }
)
df_mathing_broadcasting

Unnamed: 0,one,two,three
a,6.0,8.0,0
b,5.0,0.0,9
c,1.0,4.0,1
d,,,9


### Sub

- It is a method used to perform subtraction between Series or DataFrames.

In [9]:
df_sub_l1 = df_mathing_broadcasting.iloc[2]
df_sub_l1

one      1.0
two      4.0
three    1.0
Name: c, dtype: float64

In [10]:
df_mathing_broadcasting.sub(df_sub_l1 , axis="columns")

Unnamed: 0,one,two,three
a,5.0,4.0,-1.0
b,4.0,-4.0,8.0
c,0.0,0.0,0.0
d,,,8.0


## Missing data / operations with fill values

In [30]:
df_filldata = df.copy()
df_filldata.iloc[2, 2] = np.nan
df_filldata.head()

Unnamed: 0,a,b,c,d
2023-01-01,-0.383576,-0.879254,0.63909,0.25535
2023-01-02,-0.068703,1.10794,1.874308,2.248359
2023-01-03,0.663555,1.769808,,-1.294594
2023-01-04,0.051002,-0.901208,2.426192,-0.075746
2023-01-05,-0.683598,-0.155574,-0.520885,-1.261785


In [31]:
df_filldata.fillna(np.random.rand())
df_filldata.head()

Unnamed: 0,a,b,c,d
2023-01-01,-0.383576,-0.879254,0.63909,0.25535
2023-01-02,-0.068703,1.10794,1.874308,2.248359
2023-01-03,0.663555,1.769808,,-1.294594
2023-01-04,0.051002,-0.901208,2.426192,-0.075746
2023-01-05,-0.683598,-0.155574,-0.520885,-1.261785
