Creating DataFrames
 Let’s look at different ways to create a Pandas DataFrame — the core data
 structure you’ll be using 90% of the time in data science

In [1]:
#From Python Lists
import pandas as pd

data = [
    ["Alice", 25],
    ["Bob", 30],
    ["Charlie", 35]
]

df = pd.DataFrame(data, columns=["Name", "Age"])
print(df)

      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35


In [2]:
#From Dictionary of List
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35]
}

df = pd.DataFrame(data)

In [3]:
df

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,35


In [4]:
#From NumPy Arrays
import numpy as np

arr = np.array([[1, 2], [3, 4]])
df = pd.DataFrame(arr, columns=["A", "B"])

In [5]:
df

Unnamed: 0,A,B
0,1,2
1,3,4


In [20]:
import os

print(os.path.exists(r"C:/Users/rosha/Pandas practice/data.xls"))


False


In [22]:
!pip install xlrd

Collecting xlrd
  Downloading xlrd-2.0.2-py2.py3-none-any.whl.metadata (3.5 kB)
Downloading xlrd-2.0.2-py2.py3-none-any.whl (96 kB)
Installing collected packages: xlrd
Successfully installed xlrd-2.0.2


In [28]:
import pandas as pd
df = pd.read_excel(r"C:\Users\rosha\Pandas practice\data.xlsx")
df.head()

Unnamed: 0,Name,Marks,School
0,Roshan,58,JPS
1,Akash,75,JPS
2,Ankit,65,AIWC
3,Rajveer,78,JPS
4,Aditya,61,DAV


In [29]:
import pandas as pd
df = pd.read_excel(r"C:\Users\rosha\Pandas practice\data.xlsx")
df.head(6)

Unnamed: 0,Name,Marks,School
0,Roshan,58,JPS
1,Akash,75,JPS
2,Ankit,65,AIWC
3,Rajveer,78,JPS
4,Aditya,61,DAV
5,Jaskaran,66,LCPS


From CSV Files
df = pd.read_csv("data.csv")

From JSON
df = pd.read_json("data.json")

In [30]:
 #From the Web (Example: CSV from URL)
 url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv"
 df = pd.read_csv(url)

In [31]:
df

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


EDA (Exploratory Data Analysis)
 Exploratory Data Analysis (EDA) is an essential first step in any data science project.
 It involves taking a deep look at the dataset to understand its structure, spot
 patterns, identify anomalies, and uncover relationships between variables. This
 process includes generating summary statistics, checking for missing or duplicate
 data, and creating visualizations like histograms, box plots, and scatter plots. The
 goal of EDA is to get a clear picture of what the data is telling you before applying
 any analysis or machine learning models

In [32]:
df.head() #First 5 rows

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [33]:
df.tail()  #last 5 rows

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.0,Female,Yes,Sat,Dinner,2
241,22.67,2.0,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2
243,18.78,3.0,Female,No,Thur,Dinner,2


In [34]:
df.info()  # Column info: types, non-nulls

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   total_bill  244 non-null    float64
 1   tip         244 non-null    float64
 2   sex         244 non-null    object 
 3   smoker      244 non-null    object 
 4   day         244 non-null    object 
 5   time        244 non-null    object 
 6   size        244 non-null    int64  
dtypes: float64(2), int64(1), object(4)
memory usage: 13.5+ KB


In [35]:
df.describe()   # Stats for numeric columns

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


In [37]:
df.columns  # List of column names

Index(['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size'], dtype='object')

In [38]:
df.index  # range 

RangeIndex(start=0, stop=244, step=1)

In [39]:
df.shape  # (rows,columns)

(244, 7)

Summary  
You can create DataFrames from lists, dicts, arrays, files, web, and SQL
 Use .head() , .info() , .describe() to quickly explore any dataset