#DataFrame

##What is a Pandas DataFrame?
A **DataFrame** is a **two-dimensional labeled data structure** with columns of potentially different types. You can think of it like a spreadsheet or a SQL table. It has both a row and a column index.

##How to Create a DataFrame
There are several ways to create a DataFrame in Pandas. Here are some common methods:

1. From a Dictionary of Lists or NumPy Arrays:

In [1]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 22, 35],
    'City': ['New York', 'London', 'Paris', 'Tokyo']
}

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,London
2,Charlie,22,Paris
3,David,35,Tokyo


In this case, the keys of the dictionary become the column labels, and the lists become the column data. Pandas automatically assigns a numerical index starting from 0 to the rows.

2. From a List of Dictionaries:

In [2]:
import pandas as pd

data = [
    {'Name': 'Alice', 'Age': 25, 'City': 'New York'},
    {'Name': 'Bob', 'Age': 30, 'City': 'London'},
    {'Name': 'Charlie', 'Age': 22, 'City': 'Paris'},
    {'Name': 'David', 'Age': 35, 'City': 'Tokyo'}
]

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,London
2,Charlie,22,Paris
3,David,35,Tokyo


Here, each dictionary in the list represents a row in the DataFrame. The keys of the dictionaries become the column labels.

3. From a Series (creating a DataFrame with one column):

In [3]:
import pandas as pd

series_data = pd.Series([10, 20, 30, 40], name='Values')
df_from_series = pd.DataFrame(series_data)
df_from_series

Unnamed: 0,Values
0,10
1,20
2,30
3,40


##Basic Attributes of a DataFrame
Like Series, DataFrames have useful attributes:
* `.index`: The row index labels.
* `.columns`: The column labels.
* `.values`: A NumPy ndarray representing the data in the DataFrame (without index or column labels).
* `.dtypes`: The data type of each column.
* `.shape`: A tuple representing the dimensions of the DataFrame (number of rows, number of columns).
* `.size`: The total number of elements in the DataFrame (number of rows * number of columns).
* `.info()`: Provides a concise summary of the DataFrame, including data types, non-null values, and memory usage.
* `.describe()`: Generates descriptive statistics of the numerical columns in the DataFrame (count, mean, std, min, max, quartiles).

Let's see some of these in action with the `df` we created earlier:

In [4]:
print(f"Index: {df.index}")
print(f"\nColumns: {df.columns}")
print(f"\nValues:\n{df.values}")
print(f"\nData Types:\n{df.dtypes}")
print(f"\nShape: {df.shape}")
print(f"\nSize: {df.size}")
print(f"\nInfo:")
df.info()
print(f"\nDescription of numerical columns:")
print(df.describe())

Index: RangeIndex(start=0, stop=4, step=1)

Columns: Index(['Name', 'Age', 'City'], dtype='object')

Values:
[['Alice' 25 'New York']
 ['Bob' 30 'London']
 ['Charlie' 22 'Paris']
 ['David' 35 'Tokyo']]

Data Types:
Name    object
Age      int64
City    object
dtype: object

Shape: (4, 3)

Size: 12

Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    4 non-null      object
 1   Age     4 non-null      int64 
 2   City    4 non-null      object
dtypes: int64(1), object(2)
memory usage: 228.0+ bytes

Description of numerical columns:
             Age
count   4.000000
mean   28.000000
std     5.715476
min    22.000000
25%    24.250000
50%    27.500000
75%    31.250000
max    35.000000
