# Pandas



*   pandas is mainly used for working with structured data
*   In Python—especially tabular data like spreadsheets, CSV files, and database tables.
*   pandas makes it easy to read and write data from many sources:
*   > CSV, Excel
*   > SQL databases
*   > JSON, Parquet

*   Panda is built on top of NumPy.



In [1]:
import numpy as np
import pandas as pd

# Panda Series


*   A Series is a one-dimensional labeled array that can hold any data type (integers, floats, strings, objects).
*   Think of it as:
    - A single column of a table
    - NumPy array with labels (index)

# With custom index (labels)

*   Instead of default index, we assign custom index/labels to each item

```
s = pd.Series([10, 20, 30], index=["a", "b", "c"])
s["a"]
```
*   The index of "10" is "a"
*   The index of "20" is "b"
*   The index of "30" is "c"






In [12]:
s = pd.Series([10, 20, 30], index=["a", "b", "c"])
print(s["a"])
print("-------------------")
print(s["a":"b"])
print("-------------------")
print(s.values)
print("-------------------")
print(type(s.values))
print("-------------------")
print(type(s))
print("-------------------")
print(s.index)

10
-------------------
a    10
b    20
dtype: int64
-------------------
[10 20 30]
-------------------
<class 'numpy.ndarray'>
-------------------
<class 'pandas.core.series.Series'>
-------------------
Index(['a', 'b', 'c'], dtype='object')


# Create Pandas Series from dictionary


*   The values become values
*   The keys become labels



In [5]:
grades_dic = { "A": 4, "A-": 3.5, "B": 3, "B-": 2.5, "C": 2}

grades_series = pd.Series(grades_dic)

print(grades_series["A-"])
print("-------------------")
print(grades_series["A":"B"])
print("-------------------")
print(grades_series.values)
print("-------------------")
print(type(grades_series.values))
print("-------------------")
print(type(grades_series))
print("-------------------")
print(grades_series.index)

3.5
-------------------
A     4.0
A-    3.5
B     3.0
dtype: float64
-------------------
[4.  3.5 3.  2.5 2. ]
-------------------
<class 'numpy.ndarray'>
-------------------
<class 'pandas.core.series.Series'>
-------------------
Index(['A', 'A-', 'B', 'B-', 'C'], dtype='object')


In [8]:
marks_dic = { "A": 85, "A-": 80, "B": 75, "B-": 70, "C": 65}

marks_series = pd.Series(marks_dic)

print(marks_series)

A     85
A-    80
B     75
B-    70
C     65
dtype: int64


# Pandas DataFrame


*   It is the core data structure in pandas and is used to work with tabular (row–column) data—like spreadsheets or database tables.

> A DataFrame is a two-dimensional, labeled data structure with:
> - Rows (index)
> - Columns (labels)
> - Potentially different data types per column



# Creating a DataFrame From a dictionary (most common)

In [16]:
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "City": ["New York", "Chicago", "San Francisco"]
}

df = pd.DataFrame(data)
print(df)

      Name  Age           City
0    Alice   25       New York
1      Bob   30        Chicago
2  Charlie   35  San Francisco


# Creating a DataFrame From a list of dictionaries:

In [9]:
data = [
    {"Name": "Alice", "Age": 25},
    {"Name": "Bob", "Age": 30}
]

df = pd.DataFrame(data)

df

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30


# Creating a DataFrame From series:

#

In [10]:
D = pd.DataFrame({"Marks": marks_series, "Graded": grades_series})

print(D)

    Marks  Graded
A      85     4.0
A-     80     3.5
B      75     3.0
B-     70     2.5
C      65     2.0


# Creating a Transpose of DataFrame

In [14]:
transpose_dataframe = D.T
print(transpose_dataframe)
print("---------------------")
print(D.columns)

           A    A-     B    B-     C
Marks   85.0  80.0  75.0  70.0  65.0
Graded   4.0   3.5   3.0   2.5   2.0
---------------------
Index(['Marks', 'Graded'], dtype='object')


## Panda Nan Values


*   In pandas, NaN represents missing or undefined data.
*   It’s a core concept because real-world datasets are almost never complete.
*   NaN stands for “Not a Number”, but in pandas it is used more broadly to mean:
    > “Missing value”



In [None]:
nan_values = pd.DataFrame([{"a":1, "b": 2},{"b":3, "c":4}])
print(nan_values)
print("------------------------------")
zeros_values = nan_values.fillna(0)
print(zeros_values)
print("------------------------------")
drop_values = nan_values.dropna
print(drop_values)

#  Implicit Index (Default Index)

*   Automatically created by pandas
*   Integer-based: 0, 1, 2, 3, …
*   Created when you don’t specify an index
*   iloc always uses implicit (positional) indexing
    > df.iloc[1]

# Explicit Index (User-Defined Index)
*   Explicitly set by the user
*   Can be strings, dates, IDs, etc.
*   Represents meaningful labels
*   loc uses explicit (label-based) indexing
    > df.loc["emp2"]