# 📊 Creating DataFrames with Pandas

Pandas DataFrames are the core data structure you'll use 90% of the time in data science. Here's how to create them from various data sources:

---

## ✅ From Python Lists

```python
import pandas as pd

data = [
    ["Alice", 25],
    ["Bob", 30],
    ["Charlie", 35]
]

df = pd.DataFrame(data, columns=["Name", "Age"])
print(df)
```

---

## ✅ From Dictionary of Lists (Most Common)

```python
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35]
}

df = pd.DataFrame(data)
```

Each key becomes a column, and each list becomes the column's data.

---

## ✅ From NumPy Arrays

```python
import numpy as np

arr = np.array([[1, 2], [3, 4]])
df = pd.DataFrame(arr, columns=["A", "B"])
```

> ✅ **Tip**: Always provide column names!

---

## ✅ From CSV Files

```python
df = pd.read_csv("data.csv")
```

### Optional Parameters:
- `sep`, `header`, `names`, `index_col`, `usecols`, `nrows`, etc.

**Example**:

```python
pd.read_csv("data.csv", usecols=["Name", "Age"])
```

---

## ✅ From Excel Files

```python
df = pd.read_excel("data.xlsx")
```

> You may need to install a dependency:
```bash
pip install openpyxl
```

---

## ✅ From JSON Files

```python
df = pd.read_json("data.json")
```

> Can also read from a URL or JSON string.

---

## ✅ From SQL Databases

```python
import sqlite3

conn = sqlite3.connect("mydb.sqlite")
df = pd.read_sql("SELECT * FROM users", conn)
```

---

## ✅ From the Web (e.g., CSV URL)

```python
url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv"
df = pd.read_csv(url)
```

---

# 🔍 Exploratory Data Analysis (EDA)

EDA is a crucial first step in any data science workflow.

It involves:
- Understanding structure
- Spotting patterns
- Identifying missing/duplicate data
- Visualizing distributions and relationships

---

### Quick EDA Commands

```python
df.head()         # First 5 rows
df.tail()         # Last 5 rows
df.info()         # Data types, non-nulls
df.describe()     # Summary stats for numeric columns
df.columns        # Column names
df.shape          # (Rows, Columns)
```

---

# 📝 Summary

- You can create DataFrames from **lists, dicts, arrays, files, web, and SQL**
- Use `.head()`, `.info()`, `.describe()` to **quickly explore your data**

In [2]:
import pandas as pd

In [3]:
data = [["Emon", 87], ["Naimul", 83], ["Hasan", 80]]

In [4]:
data

[['Emon', 87], ['Naimul', 83], ['Hasan', 80]]

In [5]:
pd.DataFrame(data, columns=["Name", "Marks"])

Unnamed: 0,Name,Marks
0,Emon,87
1,Naimul,83
2,Hasan,80


In [6]:
data = {"a": [1, 4, 7], "b": [56, 35, 59]}

In [7]:
data

{'a': [1, 4, 7], 'b': [56, 35, 59]}

In [8]:
df = pd.DataFrame(data)
df

Unnamed: 0,a,b
0,1,56
1,4,35
2,7,59


In [9]:
import numpy as np

In [10]:
arr = np.array([[1, 2], [5, 6]])
arr

array([[1, 2],
       [5, 6]])

In [11]:
df = pd.DataFrame(arr, columns=["A", "B"])
df

Unnamed: 0,A,B
0,1,2
1,5,6


In [12]:
df = pd.read_excel("data.xlsx")
df

Unnamed: 0,Name,School,Marks
0,Emon,BCBHS,93
1,Naimul,BCPSC,88
2,Hasan,Adamjee,84
3,Tasnim,Natore Govt,97
4,Bindu,BCBHS,95


In [13]:
df = pd.read_csv("data.csv")
df

Unnamed: 0,Name,School,Marks
0,Emon,BCBHS,93
1,Naimul,BCPSC,88
2,Hasan,Adamjee,84
3,Tasnim,Natore Govt,97
4,Bindu,BCBHS,95


In [14]:
df = pd.read_json("data.json")
df

Unnamed: 0,Name,Age,City,Language
0,Emon,25,Dhaka,Python
1,Rifat,22,Dhaka,Java
2,Shakib,30,Dhaka,C++
3,Sakib,28,Dhaka,JavaScript
4,Nashit,26,Dhaka,Ruby


In [15]:
df = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv")
df

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


In [16]:
df.head()          # Will show the first 5 rows of the dataset

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [17]:
df.tail()          # Will show the last 5 rows of the dataset

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.0,Female,Yes,Sat,Dinner,2
241,22.67,2.0,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2
243,18.78,3.0,Female,No,Thur,Dinner,2


In [18]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   total_bill  244 non-null    float64
 1   tip         244 non-null    float64
 2   sex         244 non-null    object 
 3   smoker      244 non-null    object 
 4   day         244 non-null    object 
 5   time        244 non-null    object 
 6   size        244 non-null    int64  
dtypes: float64(2), int64(1), object(4)
memory usage: 13.5+ KB


In [19]:
df.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


In [20]:
df.columns

Index(['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size'], dtype='object')

In [21]:
df.shape

(244, 7)