# Creating DataFrames: Multiple Methods and Best Practices

### What is a Pandas DataFrame?

A **DataFrame** is the core data structure in Pandas — a two-dimensional, labeled data table. Think of it like a spreadsheet in Python: it has **rows and columns**, with labels (or indexes) for both. Each column in a DataFrame is actually a **Series**. That means when we combine multiple Series together (side-by-side), we get a DataFrame.

This structure is perfect for real-world data — like the **Titanic dataset**, where each row is a passenger, and columns hold information like age, fare, gender, etc.

Working with DataFrames gives us full control over our data. We can select, filter, sort, clean, group, analyze, and export — all with clean and readable code.

### Why Do We Use DataFrames?

In real-world AI/ML projects, almost everything we do starts with a DataFrame:

- **Clean & Explore** raw datasets from CSV, Excel, JSON, SQL
- **Engineer Features** for machine learning (like combining or transforming columns)
- **Handle Missing Data** and fix inconsistencies
- **Split datasets** into train/test sets for modeling
- **Run analytics** like group summaries, correlations, and aggregations

In short, DataFrames are **where data becomes usable**.

### Creating DataFrames – Different Methods

1.  **From a Dictionary of Lists**
    
    Each key becomes a column name. Each list becomes that column’s values. This is the **most common** way to create small, structured tables manually.

In [1]:
import pandas as pd
    
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 22],
    "City": ["New York", "Chicago", "Houston"]
}
    
df = pd.DataFrame(data)
print(df)

      Name  Age      City
0    Alice   25  New York
1      Bob   30   Chicago
2  Charlie   22   Houston


2. **From a List of Dictionaries**
    
    Each dictionary becomes a row. Keys become column names. This is useful when data comes from APIs or JSON.

In [2]:
data = [
    {"Name": "Alice", "Age": 25, "City": "New York"},
    {"Name": "Bob", "Age": 30, "City": "Chicago"},
    {"Name": "Charlie", "Age": 22, "City": "Houston"}
]
    
df = pd.DataFrame(data)
print(df)

      Name  Age      City
0    Alice   25  New York
1      Bob   30   Chicago
2  Charlie   22   Houston


3. **From a List of Lists (with Column Names)**
    
    We can also use a nested list, and define column names separately.

In [3]:
data = [
    ["Alice", 25, "New York"],
    ["Bob", 30, "Chicago"],
    ["Charlie", 22, "Houston"]
]
    
columns = ["Name", "Age", "City"]
df = pd.DataFrame(data, columns=columns)
print(df)

      Name  Age      City
0    Alice   25  New York
1      Bob   30   Chicago
2  Charlie   22   Houston


4. **From NumPy Arrays**
    
    If we already have structured data as a NumPy array, we can convert it easily.

In [4]:
import numpy as np
    
arr = np.array([[1, 2], [3, 4], [5, 6]])
df = pd.DataFrame(arr, columns=["A", "B"])
print(df)

   A  B
0  1  2
1  3  4
2  5  6


5. **Creating an Empty DataFrame**
    
    We can initialize an empty DataFrame when we want to fill it later. This is useful when building a table row-by-row or based on user input.

In [5]:
empty_df = pd.DataFrame(columns=["Name", "Age", "City"])
print(empty_df)

Empty DataFrame
Columns: [Name, Age, City]
Index: []


### Exercises

Q1. Create a DataFrame using a dictionary of lists containing names, marks, and grade of 3 students.

In [6]:
data = {
    "Names": ["Sujit", "Ram", "Hari"],
    "Marks": [50, 60, 90],
    "Grade": ["B", "B+", "A+"]
}

print(pd.DataFrame(data))

   Names  Marks Grade
0  Sujit     50     B
1    Ram     60    B+
2   Hari     90    A+


Q2. Build a DataFrame from a list of dictionaries, each representing a car with keys like "brand", "model", and "price".

In [7]:
cars = [
    {"brand": "Toyota", "model": "Corolla", "price": 25000},
    {"brand": "Honda", "model": "Civic", "price": 27000},
    {"brand": "Tesla", "model": "Model 3", "price": 42000}
]

cars_df = pd.DataFrame(cars)
print(cars_df)

    brand    model  price
0  Toyota  Corolla  25000
1   Honda    Civic  27000
2   Tesla  Model 3  42000


Q3. Use a NumPy array to create a DataFrame of shape (3, 3) with columns: "X", "Y", and "Z".

In [8]:
arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

df_from_array = pd.DataFrame(arr, columns=["X", "Y", "Z"])
print(df_from_array)

   X  Y  Z
0  1  2  3
1  4  5  6
2  7  8  9


Q4. Create an empty DataFrame with column headers for employee details like "Name", "Department", "Salary".

In [9]:
empty_employees = pd.DataFrame(columns=["Name", "Department", "Salary"])
print(empty_employees)

Empty DataFrame
Columns: [Name, Department, Salary]
Index: []


### Summary

Understanding how to create DataFrames is one of the first steps to becoming comfortable with real data. Unlike lists or arrays, DataFrames are **designed for structured, labeled data**, making them perfect for handling rows and columns from datasets like Titanic.

We can create DataFrames from:

- Dictionaries of lists (for manual table-like creation)
- Lists of dictionaries (often from JSON or APIs)
- NumPy arrays (for numerical and scientific data)
- Even empty shells (when planning to build incrementally)

Each method gives us flexibility depending on where our data comes from. In AI/ML, we’ll often load CSVs, Excel files, or datasets from Kaggle, but it’s equally important to know how to **manually build or simulate DataFrames** for testing and experimentation.

As we continue our journey, DataFrames will be at the center of everything we do — from cleaning and analyzing data, to building models and visualizing results. Knowing how to create and control them gives us a strong foundation.