**A DataFrame** is a two-dimensional, tabular data structure
in pandas, similar to a spreadsheet or SQL table.
It consists of:
Rows, Columns & Indexes.

### Key Features:
1. Columns are labeled, and rows have an index. An index can be numeric or a custom label. 
2. Heterogenous data. Different columns can store different data types.
3. Filtering, groupiing, joining and reshaping data are supported.



In [1]:
import pandas as pd

### DataFrames can be created in a few ways:

From a Dictionary 

In [2]:
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "City": ["New York", "Los Angeles", "Chicago"]
}
df = pd.DataFrame(data)
print(df)

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


From a List of Dictonaries

In [3]:
data = [
    {"Name": "Alice", "Age": 25, "City": "New York"},
    {"Name": "Bob", "Age": 30, "City": "Los Angeles"},
    {"Name": "Charlie", "Age": 35, "City": "Chicago"}
]
df = pd.DataFrame(data)
print(df)

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


List of Lists with Column Names

In [13]:
data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
df = pd.DataFrame(data, columns=["A", "B", "C"])
print(df)

   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9


CSV or Excel File
(xlsx files require pip install openpyxl)

In [9]:
# From a CSV file
df_csv = pd.read_csv("data.csv")
print(df_csv)

# From an Excel file
df_xlsx = pd.read_excel("data.xlsx")
print(df_xlsx)

   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9
   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9


Inspecting the dataframe

In [15]:
print(df.head()) #First 5 rows by default

   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9


In [16]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   A       3 non-null      int64
 1   B       3 non-null      int64
 2   C       3 non-null      int64
dtypes: int64(3)
memory usage: 204.0 bytes
None


In [17]:
print(df.describe())

         A    B    C
count  3.0  3.0  3.0
mean   4.0  5.0  6.0
std    3.0  3.0  3.0
min    1.0  2.0  3.0
25%    2.5  3.5  4.5
50%    4.0  5.0  6.0
75%    5.5  6.5  7.5
max    7.0  8.0  9.0


Accessing Data

In [19]:
print(df["B"])

0    2
1    5
2    8
Name: B, dtype: int64


In [20]:
print(df[["B", "A"]])

   B  A
0  2  1
1  5  4
2  8  7


In [24]:
# Row by index
print(df.iloc[0]) # First row

A    1
B    2
C    3
Name: 0, dtype: int64


In [None]:
# Row by label
print(df.loc[0]) # Same as above, since label is numeric.

A    1
B    2
C    3
Name: 0, dtype: int64
