# L_03: Creating DataFrames with Pandas

A DataFrame is a two-dimensional labeled data structure with columns of potentially different types.

pandas is a Python library for data analysis. It offers a number of data exploration, cleaning and transformation operations that are critical in working with data in Python.

pandas build upon numpy and scipy providing easy-to-use data structures and data manipulation functions with integrated indexing.

The main data structures pandas provides are Series and DataFrames. After a brief introduction to these two data structures and data ingestion, the key features of pandas this notebook covers are:

-Generating descriptive statistics on data
-Data cleaning using built in pandas functions

-Frequent data operations for subsetting, filtering, insertion, deletion and aggregation of data

-Merging multiple datasets using dataframes

-Working with timestamps and time-series data

Additional Recommended Resources:

-pandas Documentation: http://pandas.pydata.org/pandas-docs/stable/

-Python for Data Analysis by Wes McKinney

-Python Data Science Handbook by Jake VanderPlas

 Let's get started with our first pandas notebook!

In [None]:
import pandas as pd

# Creating DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
df

### Creating DataFrame from a list of dictionaries

In [None]:
data_list = [
    {'Name': 'David', 'Age': 28, 'City': 'Seattle'},
    {'Name': 'Eva', 'Age': 22, 'City': 'Boston'}
]
df2 = pd.DataFrame(data_list)
df2

### Creating an empty DataFrame and adding data

In [None]:
df_empty = pd.DataFrame(columns=['Name', 'Age'])
df_empty.loc[0] = ['Frank', 40]
df_empty

Creating DataFrame from lists of lists.¶

In [None]:
# Import pandas library
import pandas as pd

# initialize list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]

# Create the pandas DataFrame
df = pd.DataFrame(data, columns =['Name', 'Age'])

# print dataframe.
df


Creating a DataFrame from dictionary by proving index label explicitly.

In [None]:
import pandas as pd

# initialize data of lists.
data = {'Name': ['Tom', 'Jack', 'nick', 'juli'],
		'marks': [99, 98, 95, 90]}

# Creates pandas DataFrame.
df = pd.DataFrame(data, index = ['rank1','rank2','rank3','rank4'])

# print the data
df



 Creating DataFrame using zip() function.


In [None]:
import pandas as pd

# List1
Name = ['tom', 'krish', 'nick', 'juli']

# List2
Age = [25, 30, 26, 22]

# get the list of tuples from two lists.
# and merge them by using zip().
data = list(zip(Name, Age))


data

# Converting data into pandas Dataframe.
df = pd.DataFrame(data,columns=['Name', 'Age'])

# Print data.
df


Creating dataframe from series¶

In [None]:
# Pandas Dataframe from series.

import pandas as pd
a = [10, 20, 30, 40]
# Initialize data to series.
data = pd.Series(a)
data



Creating DataFrame from Dictionary of series.


In [None]:
import pandas as pd

# Initialize data to Dicts of series.
data = {'one': pd.Series([10, 20, 30],index=['a', 'b', 'c']),
	    'two': pd.Series([40, 50, 60],index=['a', 'b', 'c'])}

# creates Dataframe.
df = pd.DataFrame(data)

# print the data.
df


Creating a DataFrame by passing a NumPy array, with a datetime index and labeled columns:

In [None]:
import pandas as pd 
import numpy as np
dates = pd.date_range( "20240101", periods=7)
dates

df = pd.DataFrame(data=np.random.randn(7, 4), index=dates, columns=list("ABCD"))
df

Creating a DataFrame by passing a dictionary of objects that can be converted into a series-like structure:

In [None]:
df2 = pd.DataFrame(
                        {
                            "A": 1,
                            "B": pd.Timestamp("20220330"),
                            "C": pd.Series(1, index=list(range(4,8)), dtype="float32"),
                            "D": np.array([3]*4, dtype="int32"),
                            "E": pd.Categorical(["Nice", 45, "Done", "Good"]),
                            "F": pd.Series([1,2,3,4],index=list(range(4,8))),
                            "G": pd.date_range("20230101",periods=4)
                            
                        }
                    )
df2


L_03: Creating DataFrames with Pandas

A DataFrame is a two-dimensional labeled data structure with columns of potentially different types.

pandas is a Python library for data analysis. It offers a number of data exploration, cleaning and transformation operations that are critical in working with data in Python.

pandas build upon numpy and scipy providing easy-to-use data structures and data manipulation functions with integrated indexing.

The main data structures pandas provides are Series and DataFrames. After a brief introduction to these two data structures and data ingestion, the key features of pandas this notebook covers are:

-Generating descriptive statistics on data -Data cleaning using built in pandas functions

-Frequent data operations for subsetting, filtering, insertion, deletion and aggregation of data

-Merging multiple datasets using dataframes

-Working with timestamps and time-series data

Additional Recommended Resources:

-pandas Documentation: http://pandas.pydata.org/pandas-docs/stable/

-Python for Data Analysis by Wes McKinney

-Python Data Science Handbook by Jake VanderPlas

Let's get started with our first pandas notebook!



In [1]:
import pandas as pd

# Creating DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago


Creating DataFrame from a list of dictionaries

In [2]:
data_list = [
    {'Name': 'David', 'Age': 28, 'City': 'Seattle'},
    {'Name': 'Eva', 'Age': 22, 'City': 'Boston'}
]
df2 = pd.DataFrame(data_list)
df2

Unnamed: 0,Name,Age,City
0,David,28,Seattle
1,Eva,22,Boston


Creating DataFrame from lists of lists.¶



In [3]:
# Import pandas library
import pandas as pd

# initialize list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]

# Create the pandas DataFrame
df = pd.DataFrame(data, columns =['Name', 'Age'])

# print dataframe.
df


Unnamed: 0,Name,Age
0,tom,10
1,nick,15
2,juli,14


Creating a DataFrame from dictionary by proving index label explicitly.

In [4]:
import pandas as pd

# initialize data of lists.
data = {'Name': ['Tom', 'Jack', 'nick', 'juli'],
		'marks': [99, 98, 95, 90]}

# Creates pandas DataFrame.
df = pd.DataFrame(data, index = ['rank1','rank2','rank3','rank4'])

# print the data
df


Unnamed: 0,Name,marks
rank1,Tom,99
rank2,Jack,98
rank3,nick,95
rank4,juli,90


Creating DataFrame using zip() function.

In [5]:
import pandas as pd

# List1
Name = ['tom', 'krish', 'nick', 'juli']

# List2
Age = [25, 30, 26, 22]

# get the list of tuples from two lists.
# and merge them by using zip().
data = list(zip(Name, Age))


data

# Converting data into pandas Dataframe.
df = pd.DataFrame(data,columns=['Name', 'Age'])

# Print data.
df


Unnamed: 0,Name,Age
0,tom,25
1,krish,30
2,nick,26
3,juli,22


Creating dataframe from series

In [6]:
# Pandas Dataframe from series.

import pandas as pd
a = [10, 20, 30, 40]
# Initialize data to series.
data = pd.Series(a)
data



0    10
1    20
2    30
3    40
dtype: int64

Creating DataFrame from Dictionary of series.

In [7]:
import pandas as pd

# Initialize data to Dicts of series.
data = {'one': pd.Series([10, 20, 30],index=['a', 'b', 'c']),
	    'two': pd.Series([40, 50, 60],index=['a', 'b', 'c'])}

# creates Dataframe.
df = pd.DataFrame(data)

# print the data.
df

Unnamed: 0,one,two
a,10,40
b,20,50
c,30,60


Creating a DataFrame by passing a NumPy array, with a datetime index and labeled columns:



In [8]:
import pandas as pd 
import numpy as np
dates = pd.date_range( "20240101", periods=7)
dates

df = pd.DataFrame(data=np.random.randn(7, 4), index=dates, columns=list("ABCD"))
df

Unnamed: 0,A,B,C,D
2024-01-01,0.413226,-0.225865,1.908024,-0.786649
2024-01-02,0.735362,0.609896,0.563017,-0.034371
2024-01-03,0.616108,-0.560597,0.22837,-2.135003
2024-01-04,0.766504,-0.337294,-0.374272,-0.25483
2024-01-05,1.966979,1.7593,-0.469377,1.576319
2024-01-06,0.538068,-0.1797,0.067105,-0.698732
2024-01-07,-0.233091,-1.25757,-0.235078,-0.514196


Creating a DataFrame by passing a dictionary of objects that can be converted into a series-like structure:



In [None]:
df2 = pd.DataFrame(
                        {
                            "A": 1,
                            "B": pd.Timestamp("20220330"),
                            "C": pd.Series(1, index=list(range(4,8)), dtype="float32"),
                            "D": np.array([3]*4, dtype="int32"),
                            "E": pd.Categorical(["Nice", 45, "Done", "Good"]),
                            "F": pd.Series([1,2,3,4],index=list(range(4,8))),
                            "G": pd.date_range("20230101",periods=4)
                            
                        }
                    )
df2
