# Create Pandas DataFrame from Python Dictionary

## Overview
In this class, you will learn how to convert a **Python Dictionary** into a **Pandas DataFrame**.  
We will cover:
- Creating a DataFrame from different types of dictionaries.
- Using the **`pd.DataFrame()`** constructor.
- Using the **`from_dict()`** method.

---

## Why Use a Dictionary to Create a DataFrame?
- A **dictionary** in Python stores key-value pairs and is a flexible data structure.
- **Pandas DataFrame** can efficiently handle dictionary-based data.
- Converting a dictionary into a **tabular format** allows easier data analysis.

---

## Methods to Convert a Dictionary to a DataFrame
1. **Using `pd.DataFrame()` Constructor**  
   - Creates a DataFrame from a dictionary of lists or arrays.  
2. **Using `from_dict()` Method**  
   - Allows finer control over dictionary-based DataFrame creation.  
   - Supports different orientations such as **columns or index**.

---


Python dictionary is the data structure that stores the data in key-value pairs. By converting data from dictionary format to DataFrame will make it very competent for analysis by using functions of DataFrame.

There are multiple ways to convert Python dictionary object into Pandas DataFrame. Majorly used ways are,

1. DataFrame constructor
2. **`from_dict()`**

## Create DataFrame from dict using constructor

**DataFrame constructor** can be used to create DataFrame from different data structures in python like dict, list, set, tuple, and ndarray.

**Example:**

We create a DataFrame object using dictionary objects contain student data.

<div>
<img src="img/dfdict.png" width="300"/>
</div>

In [None]:
import pandas as pd

# Create dict object
student_dict = {"name": ["Joe", "Nat", "Harry"], "age": [20, 21, 19], "marks": [85.10, 77.80, 91.54]}
print(student_dict)

# Create DataFrame from dict
student_df = pd.DataFrame(student_dict)
student_df

{'name': ['Joe', 'Nat', 'Harry'], 'age': [20, 21, 19], 'marks': [85.1, 77.8, 91.54]}


Unnamed: 0,name,age,marks
0,Joe,20,85.1
1,Nat,21,77.8
2,Harry,19,91.54


>**Note:** When you convert a **`dict`** to DataFrame by default,  all the keys of the **`dict`** object becomes columns, and the range of numbers 0, 1, 2,…,n is assigned as a row index.

## DataFrame from dict with required columns only

While converting the whole **`dict`** to DataFrame, we may need only some of the columns to be included in the resulting DataFrame.

We can select only required columns by passing list column labels to **`columns=['col1', 'col2']`** parameter in the constructor.

**Example:**

In the case of student DataFrame for analyzing the annual score, we need only student **`name`** and **`marks`** whereas the **`age`** column is not required. We can select only required columns, as shown in the below example.

In [None]:
import pandas as pd

# Create dict object
student_dict = {"name": ["Joe", "Nat", "Harry"], "age": [20, 21, 19], "marks": [85.10, 77.80, 91.54]}
print(student_dict)

# Create DataFrame from dict
student_df = pd.DataFrame(student_dict, columns=["name", "marks"])
student_df

{'name': ['Joe', 'Nat', 'Harry'], 'age': [20, 21, 19], 'marks': [85.1, 77.8, 91.54]}


Unnamed: 0,name,marks
0,Joe,85.1
1,Nat,77.8
2,Harry,91.54


## DataFrame from dict with user-defined indexes

In pandas DataFrame, each row has an index that is used to identify each row. In some cases, we need to provide a customized index for each row. We can do that while creating the DataFrame from **`dict`** using the **`index`** parameter of the DataFrame constructor.

The default index is a range of integers starting from 0 to a number of rows. We can pass a list of the row indexes as **`index=['index1','index2']`** to the dataFrame constructor.

**Example:**

In the below example, we have given a customer index for each student, making it more readable and easy to access the row using it.

In [None]:
# import pandas library
import pandas as pd

# Create dict object
student_dict = {"name": ["Joe", "Nat", "Harry"], "age": [20, 21, 19], "marks": [85.10, 77.80, 91.54]}
print(student_dict)

# Create DataFrame from dict
student_df = pd.DataFrame(student_dict, index=["stud1", "stud2", "stud3"])
student_df

{'name': ['Joe', 'Nat', 'Harry'], 'age': [20, 21, 19], 'marks': [85.1, 77.8, 91.54]}


Unnamed: 0,name,age,marks
stud1,Joe,20,85.1
stud2,Nat,21,77.8
stud3,Harry,19,91.54


## DataFrame from dict by changing the column data type

By default, while creating a DataFrame from **`dict`** using constructor, it keeps the original data type of the values in dict. But, if we need to change the data type of the data in the resulting DataFrame, we can use the **`dtype`** parameter in the constructor.

Only one data type is allowed to specify as **`dtype='data_type'`** which will be applicable for all the data in the resultant DataFrame. If we do not force such a data type, it internally infers from the Data.

**Example:**

As you can see below example, we are trying to change the data type to **float64** for all the columns. But, it changes the data type of **`age`** and **`marks`** columns only to **float64** even though the **`marks`** column type was **`object`**. But, the **`name`** column type is not changed because string values in that column cannot be converted to **float64**.

In [None]:
# import pandas library
import pandas as pd

# Create dict object
student_dict = {"name": ["Joe", "Nat", "Harry"], "age": [20, 21, 19], "marks": ["85", "77", "91.54"]}

# Create DataFrame from dict
student_df = pd.DataFrame(student_dict)
print("DataFrame with inferred data type : \n", student_df.dtypes)


DataFrame with inferred data type : 
 name     object
age       int64
marks    object
dtype: object


In [None]:

student_df = pd.DataFrame(student_dict, dtype="float64", errors='ignore')
print("DataFrame with changed data type : \n", student_df.dtypes)

print(student_df)

TypeError: DataFrame.__init__() got an unexpected keyword argument 'errors'

In [None]:
student_df = student_df.astype('float64', errors='ignore')
student_df

student_df.dtypes

name      object
age      float64
marks     object
dtype: object

>**Note:** It changes the data type only if it is compatible with the new data type. Otherwise, it keeps the original data type.


---

### **1. Creating a DataFrame from a Dictionary (Default `columns` Orientation)**
By default, `from_dict()` creates a DataFrame where dictionary **keys become column names** and values become row data.

```python
import pandas as pd

# Dictionary with lists as values
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "City": ["New York", "Los Angeles", "Chicago"]
}

# Creating DataFrame
df = pd.DataFrame.from_dict(data)

print(df)
```

#### **Output:**
```
     Name  Age         City
0   Alice   25     New York
1     Bob   30  Los Angeles
2  Charlie   35      Chicago
```

---

### **2. Using `orient='index'` to Create a DataFrame**
When `orient='index'`, dictionary **keys become row indices** and values become row data.

```python
# Dictionary with nested dictionaries
data = {
    "Alice": {"Age": 25, "City": "New York"},
    "Bob": {"Age": 30, "City": "Los Angeles"},
    "Charlie": {"Age": 35, "City": "Chicago"}
}

# Creating DataFrame with index orientation
df = pd.DataFrame.from_dict(data, orient='index')

print(df)
```

#### **Output:**
```
         Age         City
Alice     25     New York
Bob       30  Los Angeles
Charlie   35      Chicago
```

---

### **3. Creating a DataFrame with `orient='columns'` (Explicit)**
This is equivalent to the default behavior but explicitly specifying `orient='columns'`.

```python
df = pd.DataFrame.from_dict(data, orient='columns')

print(df)
```
Since our data format in **Example 2** is already suited for `orient='index'`, using `orient='columns'` would cause issues.  
However, for standard dictionaries of lists (like Example 1), it works as expected.

---

### **4. Handling Missing Data**
If some dictionary entries have missing keys, Pandas fills them with `NaN`.

```python
data = {
    "Alice": {"Age": 25, "City": "New York"},
    "Bob": {"Age": 30},  # Missing "City"
    "Charlie": {"City": "Chicago"}  # Missing "Age"
}

df = pd.DataFrame.from_dict(data, orient='index')

print(df)
```

#### **Output:**
```
         Age     City
Alice   25.0  New York
Bob     30.0      NaN
Charlie  NaN  Chicago
```
Missing values are filled with `NaN` (Not a Number).

---

### **Key Takeaways**
| Use Case | `orient='columns'` (default) | `orient='index'` |
|----------|--------------------|----------------|
| Dictionary keys become | **Column names** | **Row indices** |
| Values are treated as | **Column data** | **Row data** |
| Works well with | **Dictionary of lists** | **Dictionary of dictionaries** |
