# What is Pandas, and why is it used in Python?
Answer:
Pandas is a powerful Python library for data manipulation and analysis. It provides data structures like Series and DataFrame to efficiently handle and analyze structured data. It is widely used in data science, machine learning, and data wrangling due to its flexibility and ease of use.

In [14]:
# How do you create a Pandas DataFrame from a dictionary?
import pandas as pd  

data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}  
df = pd.DataFrame(data)  
print(df)

    Name  Age
0  Alice   25
1    Bob   30


In [15]:
# How do you check for missing values in a DataFrame?
df.isnull().sum()  # Returns the count of missing values in each column


Name    0
Age     0
dtype: int64

In [16]:
# What is the difference between loc[] and iloc[]?
# Answer:

# loc[] selects data using labels (e.g., column/row names).

# iloc[] selects data using integer positions (row and column indices).
display(df.loc[0, 'Name'])  # Selects by label  
df.iloc[0, 0]  # Selects by position


'Alice'

'Alice'

In [17]:
# How do you filter rows in a DataFrame based on a condition?
df[df['Age'] > 25]


Unnamed: 0,Name,Age
1,Bob,30


In [18]:
# How do you drop a column from a DataFrame?
df.drop(columns=['Age'], inplace=True)

In [None]:
# What does the describe() function do?
df.describe()

In [None]:
# What are MultiIndex DataFrames, and how do you create one?
#  MultiIndex DataFrame has multiple levels of indexing (hierarchical indexing).



Unnamed: 0_level_0,Unnamed: 1_level_0,Value
Category,Number,Unnamed: 2_level_1
A,1,10
A,2,20
B,1,30
B,2,40


In [None]:
# What is the difference between sort_values() and sort_index()?
# Answer:

# sort_values() sorts based on column values.

# sort_index() sorts based on index labels.
df.sort_values(by='Age')  
df.sort_index()


In [None]:
# What is the pivot_table() function, and how is it different from groupby()?
# Answer:

# pivot_table() is used for summarizing data with aggregation functions.

# groupby() aggregates but doesn’t reshape data.

df.pivot_table(index='Category', values='Sales', aggfunc='sum')


In [None]:
# 22. What is vectorization in Pandas, and why is it faster than loops?
# Answer:
# Vectorization applies operations on entire arrays instead of element-wise loops. It’s faster because it utilizes low-level optimizations.
df['new_col'] = df['col1'] + df['col2']  # Vectorized  
df['new_col'] = [x + y for x, y in zip(df['col1'], df['col2'])]  # Slower loop


In [None]:
# . How do you handle categorical data efficiently in Pandas?
# Answer:
# Convert categorical columns to category type.
df['Category'] = df['Category'].astype('category')


In [None]:
# How do you get the top N rows with the highest values in a column?
df.nlargest(5, 'column_name')


In [None]:
# How do you count occurrences of each value in a column?
df['column_name'].value_counts()


In [None]:
# How do you shuffle a DataFrame randomly?
df = df.sample(frac=1).reset_index(drop=True)


In [23]:
# How do you create a rolling window mean?
df['rolling_avg'] = df['A'].rolling(window=3).mean()


KeyError: 'A'

In [None]:
# 16. How do you apply a function row-wise in Pandas?
# Answer:
# Use apply(axis=1).
df['new_column'] = df.apply(lambda row: row['A'] + row['B'], axis=1)


In [None]:
# 24. How do you check memory usage of a DataFrame?
# Answer:
# Use memory_usage().
df.memory_usage(deep=True)


In [None]:
# 28. How do you efficiently iterate over rows in Pandas?
# Answer:
# Use itertuples() instead of iterrows() for better performance.
for row in df.itertuples():
    print(row.A, row.B)


In [None]:
# . Support for Real-Time or Streaming Data:
# Pandas: Primarily used for static, in-memory data analysis.

# Dask: Can handle streaming data or data that arrives over time, making it more suitable for real-time data processing.
# Pandas
import pandas as pd
df = pd.read_csv('large_data.csv')
result = df.groupby('column').mean()

# Dask
import dask.dataframe as dd
df = dd.read_csv('large_data.csv')
result = df.groupby('column').mean().compute()  # Trigger computation


FileNotFoundError: [Errno 2] No such file or directory: 'large_data.csv'