# Pandas-2: Introduction & Creation of DataFrames


## 1. What is a DataFrame?

A DataFrame is a two-dimensional, labeled data structure that stores data in rows and columns, 
allowing each column to hold values of different data types. 

It is designed for handling and analyzing structured (tabular) data efficiently.

A **DataFrame** is:

- **Two-dimensional**: Data is arranged in rows and columns.
- **Labeled**: Each row has an index (like row numbers), and each column has a name.
- **Flexible**: Can store different data types in different columns (numbers, text, dates, etc.).
- **Mutable**: You can change, add, or delete rows and columns anytime.

**Example:**

| Name    | Age | City     |
|---------|-----|----------|
| Alice   | 25  | London   |
| Bob     | 30  | New York |
| Charlie | 35  | Paris    |

This is a DataFrame with:

- **3 rows**
- **3 columns** (`Name`, `Age`, `City`)


## 2. Different ways to create a DataFrame


let's look into different ways to create a dictionary step-by-step.

In [4]:
# import pandas
import pandas as pd
print(pd.__version__)

2.3.1


**From a Dictionary of Lists**

In [14]:
data = {
    'Name': ['Amit', 'Anand', 'Akash'],
    'Age': [25, 28, 30],
    'City': ['Mumbai', 'Delhi', 'Patna']
}
df = pd.DataFrame(data)
print(df)


    Name  Age    City
0   Amit   25  Mumbai
1  Anand   28   Delhi
2  Akash   30   Patna


**From a List of Dictionaries**

In [15]:
data = [
    {'Name': 'Amit', 'Age': 25, 'City': 'Mumbai'},
    {'Name': 'Anand', 'Age': 28, 'City': 'Delhi'},
    {'Name': 'Akash', 'Age': 30, 'City': 'Patna'}
]
df = pd.DataFrame(data)
print(df)


    Name  Age    City
0   Amit   25  Mumbai
1  Anand   28   Delhi
2  Akash   30   Patna


**From a List of Lists (with column names)**

In [16]:
data = [
    ['Amit', 25, 'Mumbai'],
    ['Anand', 28, 'Delhi'],
    ['Akash', 30, 'Patna']
]
df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])
print(df)


    Name  Age    City
0   Amit   25  Mumbai
1  Anand   28   Delhi
2  Akash   30   Patna


**From a NumPy Array**

In [17]:
import numpy as np

arr = np.array([
    ['Amit', 25, 'Mumbai'],
    ['Anand', 28, 'Delhi'],
    ['Akash', 30, 'Patna']
])

In [18]:
df = pd.DataFrame(arr, columns=['Name', 'Age', 'City'])
print(df)

    Name Age    City
0   Amit  25  Mumbai
1  Anand  28   Delhi
2  Akash  30   Patna


**From Series objects**

In [19]:
name = pd.Series(['Amit', 'Anand', 'Akash'])
age = pd.Series([25, 28, 30])
city = pd.Series(['Mumbai', 'Delhi', 'Patna'])


In [20]:
df = pd.DataFrame({'Name': name, 'Age': age, 'City': city})
print(df)

    Name  Age    City
0   Amit   25  Mumbai
1  Anand   28   Delhi
2  Akash   30   Patna


**From a CSV / Excel file**

In [None]:
# CSV
df = pd.read_csv('indian_data.csv')

# Excel
df = pd.read_excel('indian_data.xlsx')


In [3]:
import pandas as pd

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

# The Iris dataset has no header, so specify column names
column_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']

# Read CSV without index column
df = pd.read_csv(url, header=None, names=column_names, index_col=False)

print(df.head())


   sepal_length  sepal_width  petal_length  petal_width        class
0           5.1          3.5           1.4          0.2  Iris-setosa
1           4.9          3.0           1.4          0.2  Iris-setosa
2           4.7          3.2           1.3          0.2  Iris-setosa
3           4.6          3.1           1.5          0.2  Iris-setosa
4           5.0          3.6           1.4          0.2  Iris-setosa


## DataFrame Internals

---

Happy Learning ! Team DecodeAiML !!