## Pandas

**What is pandas?**
Pandas is an open-source data manipulation and analysis library for Python.

**How to import Pandas?**

In [None]:
import pandas as pd

**What is a DataFrame?**
A DataFrame is a two-dimensional, tabular data structure in Pandas.

**How to create a DataFrame in Pandas?**

In [None]:
df = pd.DataFrame({'Column1': [1, 2, 3], 'Column2': ['A', 'B', 'C']})

**How to select specific columns in a DataFrame?**
You can select specific columns using double square brackets: df[['Column1', 'Column2']]

**What is the purpose of the describe() function in Pandas?**
describe() provides summary statistics of numeric columns in a DataFrame.

In [None]:
df.describe()

**Explain the concept of broadcasting in Pandas.**
Broadcasting allows operations between arrays of different shapes and sizes.

In [None]:
df['Column'] = df['Column'] * 2

**Explain the purpose of the crosstab() function in Pandas.**
crosstab() computes a cross-tabulation of two or more factors.

In [None]:
pd.crosstab(df['Factor1'], df['Factor2'])

**How to handle categorical data in Pandas?**
You can use the astype() method to convert a column to a categorical type:

In [None]:
df['Category'] = df['Category'].astype('category')

**Explain the use of the nunique() function in Pandas.**
nunique() returns the number of unique elements in a column.

In [None]:
df['Column'].nunique()

**What is the use of the nlargest() function in Pandas?**
nlargest() returns the first n largest elements from a series or DataFrame.

In [None]:
df['Column'].nlargest(5)

**How to convert a Pandas DataFrame to a NumPy array?**
You can use the values attribute: df.values

**What is the difference between loc and iloc in Pandas?**
loc is label-based indexing, while iloc is integer-based indexing.

**How to drop a column in a DataFrame?**
You can drop a column using the drop() method: df.drop('ColumnName', axis=1, inplace=True)

**Explain the use of groupby() in Pandas.**
groupby() is used to group DataFrame by a column and perform aggregate functions.

In [None]:
df.groupby('Column').mean()

**What is the purpose of the apply() function in Pandas?**
apply() is used to apply a function along the axis of a DataFrame.

In [None]:
df['Column'].apply(lambda x: x*2)

**How to filter rows in a DataFrame based on a condition?**
You can use boolean indexing to filter rows based on a condition: df[df['Column'] > 10]

**What is the purpose of the pivot_table() function?** pivot_table() is used to create a spreadsheet-style pivot table as a DataFrame.

In [None]:
pd.pivot_table(df, values='Value', index='Index', columns='Column', aggfunc=np.sum)

**How to handle duplicate values in a DataFrame?**
You can use drop_duplicates() to remove duplicate rows: df.drop_duplicates()

**Explain the purpose of the iterrows() function in Pandas.** iterrows() is used to iterate over DataFrame rows as (index, Series) pairs.

In [None]:
for index, row in df.iterrows(): print(index, row['Column'])

**Explain the use of melt() function in Pandas.** melt() is used to reshape or transform data by unpivoting it.

In [None]:
pd.melt(df, id_vars=['ID'], value_vars=['Var1', 'Var2'])

**What is the purpose of the to_csv() method in Pandas?**
to_csv() is used to write a DataFrame to a CSV file.

In [None]:
df.to_csv('output.csv', index=False)

**How to calculate correlation between columns in a DataFrame?** You can use the corr() method: df.corr()

**Explain the purpose of the get_dummies() function in Pandas.** get_dummies() is used for one-hot encoding categorical variables.

In [None]:
pd.get_dummies(df['Category'])

**Explain the use of merge() in Pandas.**
merge() is used to combine two DataFrames based on a common column.

In [None]:
pd.merge(df1, df2, on='common_column')

**How to handle missing values in a DataFrame?** You can use methods like dropna() to remove missing values or fillna() to fill them with a specific value.

In [None]:
df.dropna() or df.fillna(value)

**How to rename columns in a DataFrame?**
You can use the rename() method to rename columns: 

In [None]:
df.rename(columns={'OldName': 'NewName'})

**Explain the concept of MultiIndex in Pandas.**
MultiIndex allows you to have multiple index levels on an axis.

**How to handle time series data in Pandas?** Pandas provides the Timestamp type and functions like resample() for time series analysis.

In [None]:
df['Date'] = pd.to_datetime(df['Date'])

**Explain the purpose of the cut() function in Pandas.**
cut() is used to segment and sort data values into bins.

In [None]:
pd.cut(df['Values'], bins=[0, 10, 20, 30], labels=['<10', '10-20', '20-30'])