Creating DataFrames
Let’s look at different ways to create a Pandas DataFrame — the core data structure you’ll be using 90% of the time in data science.

From Python Lists
import pandas as pd
 
data = [
    ["Alice", 25],
    ["Bob", 30],
    ["Charlie", 35]
]
 
df = pd.DataFrame(data, columns=["Name", "Age"])
print(df)

From Dictionary of Lists
Most common and readable format:

data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35]
}
 
df = pd.DataFrame(data)

Each key becomes a column, and each list is the column data.

From NumPy Arrays
import numpy as np
 
arr = np.array([[1, 2], [3, 4]])
df = pd.DataFrame(arr, columns=["A", "B"])

Make sure to provide column names!

From CSV Files
df = pd.read_csv("data.csv")

Use options like:

sep, header, names, index_col, usecols, nrows, etc.
Example:

pd.read_csv("data.csv", usecols=["Name", "Age"])

From Excel Files
df = pd.read_excel("data.xlsx")

You may need to install openpyxl or xlrd:

pip install openpyxl

From JSON
df = pd.read_json("data.json")

Can also read from a URL or string.

From SQL Databases
import sqlite3
 
conn = sqlite3.connect("mydb.sqlite")
df = pd.read_sql("SELECT * FROM users", conn)

From the Web (Example: CSV from URL)
url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv"
df = pd.read_csv(url)

EDA (Exploratory Data Analysis)
Exploratory Data Analysis (EDA) is an essential first step in any data science project.

It involves taking a deep look at the dataset to understand its structure, spot patterns, identify anomalies, and uncover relationships between variables. This process includes generating summary statistics, checking for missing or duplicate data, and creating visualizations like histograms, box plots, and scatter plots. The goal of EDA is to get a clear picture of what the data is telling you before applying any analysis or machine learning models.

By exploring the data thoroughly, you can make better decisions about how to clean, transform, and model it effectively.

Once your DataFrame is ready, run these to understand your data:

df.head()         # First 5 rows
df.tail()         # Last 5 rows
df.info()         # Column info: types, non-nulls
df.describe()     # Stats for numeric columns
df.columns        # List of column names
df.shape          # (rows, columns)

Summary
You can create DataFrames from lists, dicts, arrays, files, web, and SQL
Use .head(), .info(), .describe() to quickly explore any dataset
Download Jupyter Notebook
Download data.csv

In [8]:
import pandas as pd

# From Python Lists

data = [
    ["Tasmir Khan", 25],
    ["Arhaan Khan", 21],
    ["Abbas Khan", 23]
]

df = pd.DataFrame(data,columns=("Name","Age"))
print(df)
df.index = [1,2,3]
print("\n",df)

          Name  Age
0  Tasmir Khan   25
1  Arhaan Khan   21
2   Abbas Khan   23

           Name  Age
1  Tasmir Khan   25
2  Arhaan Khan   21
3   Abbas Khan   23


In [12]:
# From Dictionary to Lists
import pandas as pd

data = {
      "name":["Tasmir", "Sujal", "Aman", "Doraemon"],
      "Age":[21,21,24,15]
}

df = pd.DataFrame(data)
print(df)

df.index = [1,2,3,4]
df.columns = ["Naam", "Umr"]
print("\n",df)

       name  Age
0    Tasmir   21
1     Sujal   21
2      Aman   24
3  Doraemon   15

        Naam  Umr
1    Tasmir   21
2     Sujal   21
3      Aman   24
4  Doraemon   15


In [24]:
# From numpy arrays
import numpy as np
import pandas as pd
import random
arr = np.array(np.random.randint(1,10,(3,3)))
print("OG Array is :-\n",arr)

df = pd.DataFrame(arr,columns=["A","B","C"])
print("\n",df)
df.index = ['R1', 'R2','R3']
df.columns = ['C1','C2','C3']
print("\n",df)


OG Array is :-
 [[8 6 8]
 [8 6 1]
 [5 7 1]]

    A  B  C
0  8  6  8
1  8  6  1
2  5  7  1

     C1  C2  C3
R1   8   6   8
R2   8   6   1
R3   5   7   1


In [34]:
#from CSV files

import pandas as pd

df = pd.read_csv("currency.csv")
print(df)

df2 = pd.read_csv("agreement.csv")
print("\n",df2)

# df3 = pd.read_csv("country_full.csv")
# print("\n",df3)

    Code Symbol                      Name
0    AED    د.إ    United Arab Emirates d
1    AFN      ؋            Afghan afghani
2    ALL      L              Albanian lek
3    AMD    AMD             Armenian dram
4    ANG      ƒ  Netherlands Antillean gu
..   ...    ...                       ...
158  XOF    CFA    West African CFA franc
159  XPF     Fr                 CFP franc
160  YER      ﷼               Yemeni rial
161  ZAR      R        South African rand
162  ZMW     ZK            Zambian kwacha

[163 rows x 3 columns]

            Agreement
0     Strongly Agree
1              Agree
2            Neutral
3           Disagree
4  Strongly Disagree


In [39]:
import pandas as pd
!pip install openpyxl
import time
#From Excel files
start = time.time()
df = pd.read_excel("1mb.xlsx")
end = time.time()
print("\n Read time", end-start)
s2 = time.time()
print(df)
e2 = time.time()
print("\nPrinting time:-",e2-s2)


 Read time 0.5637860298156738
                       Name                        Email            Phone  \
0             Austen Russel     rubye.bernhard@gmail.com     463.769.1464   
1             Darion Marvin             orpha24@feil.com  +1-865-703-2210   
2           Lenny Jaskolski    flatley.cristal@yahoo.com   1-743-426-5249   
3     Dr. Demarco Nolan III   demetrius.harber@gmail.com  +1-281-763-9451   
4               Cyrus Braun     hills.schuyler@jerde.com     857-587-8694   
...                     ...                          ...              ...   
3995        Wilfred Trantow   serenity.metz@weissnat.net     463.381.0970   
3996         Mrs. Gail Dare      swift.kathryn@gmail.com  +1-281-443-4721   
3997        Shawn Crist III      muller.kaylee@yahoo.com   1-608-368-4133   
3998           Alison Crist    matilda.crist@hotmail.com   (458) 239-0396   
3999        Geovanny Heaney  heaven.runolfsson@gmail.com   1-848-789-7748   

                                            

In [45]:
# From Json
import pandas as pd

df = pd.read_json("weather.json",lines=True)
print(df)

     MinTemp  MaxTemp  Rainfall  Evaporation Sunshine WindGustDir  \
0        8.0     24.3       0.0          3.4      6.3          NW   
1       14.0     26.9       3.6          4.4      9.7         ENE   
2       13.7     23.4       3.6          5.8      3.3          NW   
3       13.3     15.5      39.8          7.2      9.1          NW   
4        7.6     16.1       2.8          5.6     10.6         SSE   
..       ...      ...       ...          ...      ...         ...   
361      9.0     30.7       0.0          7.6     12.1         NNW   
362      7.1     28.4       0.0         11.6     12.7           N   
363     12.5     19.9       0.0          8.4      5.3         ESE   
364     12.5     26.9       0.0          5.0      7.1          NW   
365     12.3     30.2       0.0          6.0     12.6          NW   

    WindGustSpeed WindDir9am WindDir3pm WindSpeed9am  ...  Humidity3pm  \
0              30         SW         NW            6  ...           29   
1              39      

In [47]:
import sqlite3
conn = sqlite3.connect("mydb.sqlite")
df = pd.read_sql("SELECT*FROM users",conn)


DatabaseError: Execution failed on sql 'SELECT*FROM users': no such table: users

In [49]:
# From the web (Example: CSV from URL)

url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv"

df = pd.read_csv(url)
print(df)

     total_bill   tip     sex smoker   day    time  size
0         16.99  1.01  Female     No   Sun  Dinner     2
1         10.34  1.66    Male     No   Sun  Dinner     3
2         21.01  3.50    Male     No   Sun  Dinner     3
3         23.68  3.31    Male     No   Sun  Dinner     2
4         24.59  3.61  Female     No   Sun  Dinner     4
..          ...   ...     ...    ...   ...     ...   ...
239       29.03  5.92    Male     No   Sat  Dinner     3
240       27.18  2.00  Female    Yes   Sat  Dinner     2
241       22.67  2.00    Male    Yes   Sat  Dinner     2
242       17.82  1.75    Male     No   Sat  Dinner     2
243       18.78  3.00  Female     No  Thur  Dinner     2

[244 rows x 7 columns]
