In [None]:

import pandas as pd
import numpy as np
from random import random

#### Introduction to Pandas
Pandas is a Python library used for data manipulation and analysis. It provides two key data structures:

- **Series**: A one-dimensional labeled array, similar to a column in a spreadsheet or database.

- **DataFrame**: A two-dimensional labeled data structure, like a table or spreadsheet with rows and columns.

Pandas makes handling structured data very easy and efficient.

*In other words:*

##### Pandas has three core structures:

- Series → 1-D labeled array (like a column in Excel).

- DataFrame → 2-D table (collection of Series).

- Index → label system for rows/columns (acts like a set with alignment logic).

All three are built on top of NumPy arrays, but they add labels, alignment, heterogeneous dtypes, and rich methods.

In [5]:
# from a Python list
s1 = pd.Series([10, 20, 30, 40])
print(s1)

# custom index
s1 = pd.Series([10, 20, 30, 40], index=['A', 'B', 'C', 'D'])
print(s1)


0    10
1    20
2    30
3    40
dtype: int64
A    10
B    20
C    30
D    40
dtype: int64


In [7]:
# from a dict

data = {'Alice': 85, 'Bob': 90, 'Charlie': 78}
result = pd.Series(data)
result

# Keys become index labels, values become data.

Alice      85
Bob        90
Charlie    78
dtype: int64

In [9]:
# scaler + index
s = pd.Series(5, index=['x','y','z'])
s

x    5
y    5
z    5
dtype: int64

In [15]:
# from NumPy array
arr = np.random.rand(4)
s = pd.Series(arr, index=list('WXYZ'))
s

W    0.653478
X    0.362264
Y    0.011001
Z    0.974031
dtype: float64

In [18]:
# Pandas aligns labels. Missing labels → NaN.

sA = pd.Series({'A': 1, 'B': 2, 'C': 3})
sB = pd.Series({'B': 10, 'C': 20, 'D': 30})
print(sA + sB)


A     NaN
B    12.0
C    23.0
D     NaN
dtype: float64


<bound method NDFrame.describe of A    1
B    2
C    3
dtype: int64>

Internal working:

- Series stores data in a NumPy array (BlockManager internally).
- The index is a separate Index object.
- Operations use vectorized NumPy ops where possible (C-speed).
- Alignment uses the index mapping — very powerful for joining or adding differently-shaped data.

In [50]:
# Method 1: Dictionary of lists
record = {
    'Product': ['Laptop', 'Phone', 'Tablet'],
    'Price': [1200, 800, 500],
    'Stock': [15, 30, 25]
}
data = pd.DataFrame(record)
print(data)



  Product  Price  Stock
0  Laptop   1200     15
1   Phone    800     30
2  Tablet    500     25


In [26]:
# Method 2: List of dictionaries
data2 = [
    {'Product': 'Laptop', 'Price': 1200, 'Stock': 15},
    {'Product': 'Phone', 'Price': 800, 'Stock': 30},
    {'Product': 'Tablet', 'Price': 500, 'Stock': 25}
]
df2 = pd.DataFrame(data2)
print(df2)


  Product  Price  Stock
0  Laptop   1200     15
1   Phone    800     30
2  Tablet    500     25


In [117]:

# Method 3: NumPy array with column names
arr = np.random.randint(0, 100, size=(5, 3))
data = pd.DataFrame(arr, columns=['Maths', 'Science', 'English'], index=['Student1', 'Student2', 'Student3', 'Student4', 'Student5'])

print(data)


          Maths  Science  English
Student1     98       59       16
Student2      3       68        0
Student3     47       27       64
Student4     18       11        6
Student5      6       33       28


In [132]:

# Method 4: From Series
s1 = pd.Series([100, 200, 300], name="Revenue")
s2 = pd.Series([80, 150, 250], name="Cost")
data = pd.DataFrame({"Revenue": s1, "Cost": s2})
print(data)

   Revenue  Cost
0      100    80
1      200   150
2      300   250


DataFrame uses a BlockManager internally to store each column block by dtype.

When you perform arithmetic, pandas aligns indices first, then uses NumPy vectorized ops on the underlying arrays.

Index objects are immutable to ensure predictable alignment semantics.

In [133]:
df = pd.read_csv("electronic_sales.csv", parse_dates=["date"])

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 60 entries, 0 to 59
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   order_id    60 non-null     int64 
 1   date        60 non-null     object
 2   customer    60 non-null     object
 3   region      60 non-null     object
 4   product     60 non-null     object
 5   units       60 non-null     int64 
 6   unit_price  60 non-null     object
 7   returned    57 non-null     object
 8   notes       23 non-null     object
dtypes: int64(2), object(7)
memory usage: 4.3+ KB


In [None]:

# print(df.describe(include='all'))
# print(df.info)
df['revenue'] = df['units'] * df['unit_price']
df['units']
df[['order_id','product','units','unit_price','revenue']].head(5)

Unnamed: 0,order_id,product,units,unit_price,revenue
0,1001,Laptop,2,800,800800
1,1002,Smartphone,1,500,500
2,1003,Laptop,1,800,800
3,1004,Headphones,5,50,5050505050
4,1005,Smartphone,2,500,500500
