# Pandas - Advanced Dataframe Operations

## What is the key difference between the Python values `None`, `0`, and `NaN`?

- A) `None` is a special value that represents the absence of a value, `0` represents the integer zero, and `NaN` represents "Not a Number". ***
- B) `None` is a float, `0` is an integer, and `NaN` is a special value used only in scientific computing.
- C) `None` and `0` are both valid numeric values, while `NaN` is not.
- D) `None` is equivalent to 0, while `NaN` is a value that represents an undefined or unrepresentable value.

## What is the correct way to add a new column named `total` to a pandas DataFrame `df`, where total is the sum of two existing columns `col`1 and `col2`?

- A) `df.total = df.col1 + df.col2`
- B) `df['total'] = df.col1 + df.col2` ***
- C) `df.add_column('total', df.col1 + df.col2)`
- D) `df.insert('total', df.col1 + df.col2)`

## What is the correct way to remove the `total` column from the same DataFrame `df`?

- A) `df.total.drop()`
- B) `df.drop('total', axis=1)` ***
- C) `del df.total`
- D) `df.pop('total')`

## Suppose you have a pandas DataFrame `df` that contains a column named `sales`. What will happen if you execute the following code to add a new column with the same name?

`df['sales'] = [10, 20, 30, 40]`

- A) The existing `sales` column will be overwritten with the new values.***
- B) A new column with the same name sales will be added to the DataFrame, with the new values.
- C) The code will raise an error, because you cannot add a column with a name that already exists.
- D) The behavior of the code is undefined and can vary based on the version of pandas being used.

## Suppose you have a pandas DataFrame `df` with the following columns: `Name`, `Age`, `Salary`, and `Country`. You want to sort the DataFrame by the `Salary column` in descending order, and modify the original DataFrame instead of creating a copy. Which of the following statements is correct?

- A) `df.sort_values(by='Salary', ascending=False, inplace=True)` will sort the DataFrame by the `Salary` column in descending order and modify the original DataFrame. ***

- B) `df.sort_values(by='Salary', ascending=False)` will sort the DataFrame by the `Salary` column in descending order, but will not modify the original DataFrame.

- C) `df = df.sort_values(by='Salary', ascending=False)` will sort the DataFrame by the `Salary` column in descending order and create a new DataFrame df instead of modifying the original DataFrame.

- D) A and C are correct.

## Suppose you have a pandas DataFrame `df` with the following columns: `Name`, `Age`, `Salary`, and `Country`. You want to drop the `Salary` column without modifying the original DataFrame. Which of the following statements is correct?

- A) `df_copy = df.drop('Salary', axis=1).copy()` will create a new DataFrame `df_copy` and drop the `Salary` column from the original DataFrame `df`.

- B) `df_copy = df.copy().drop('Salary', axis=1, inplace=True)` will create a new DataFrame `df_copy` and drop the `Salary` column from the copied DataFrame, not the original DataFrame. ***

- C) `df.drop('Salary', axis=1, inplace=True).copy()` will drop the `Salary` column from the original DataFrame `df` and create a new DataFrame by copying the modified DataFrame.

- D) None of the above statements are correct.

## Which of the following statements is true about the difference between the `loc` and `iloc` methods in pandas?

- A) The `loc` method selects rows and columns by label-based indexing, while the `iloc` method selects rows and columns by integer-based indexing. ***
- B) The `loc `method selects rows and columns by integer-based indexing, while the `iloc` method selects rows and columns by label-based indexing.
- C) The `loc` and `iloc` methods are equivalent and can be used interchangeably to select rows and columns from a DataFrame.
- D) None of the above statements are true.


## Suppose you have a pandas DataFrame `df` with the following columns: `Name`, `Age`, `Salary`, and `Country`. How would you select all the rows of the `Age` ?

- A) `ages = df.loc['Age']`
- B) `ages = df.iloc['Age']`
- C) `ages = df.iloc[:, 'Age']`
- D) `ages = df.loc[:, 'Age']` ***

## Suppose you have a pandas DataFrame `df` with the following columns: `Name`, `Age`, `Salary`, and `Country`. You then sort the DataFrame by descending age using the following code `sorted_df = df.sort_values(by='Age', ascending=False)`. How would you then select the first row, the entry with the highest age?

- A) `highest_age_entry = sorted_df.iloc[0, :]` ***
- B) `highest_age_entry = sorted_df.iloc[0]`
- C) `highest_age_entry = sorted_df.loc[0, :]`
- D) `highest_age_entry = sorted_df.iloc[1]`

## What is the ISO standard format for representing dates and times in a machine-readable and unambiguous way?

- A) `DD/MM/YYYY hh:mm:ss`
- B) `YYYY-MM-DD hh:mm:ss` ***
- C) `MM/DD/YYYY hh:mm:ss`
- D) `hh:mm:ss DD/MM/YYYY`

## Assuming you have a pandas DataFrame named `df` with a column named `time_col` containing time data, which of the following lines of code would format the `time_col` to the ISO standard format?

- A) `df['time_col'].to_datetime(format='%Y-%m-%d %H:%M:%S')` ***
- B) `df['time_col'].strftime('%Y-%m-%d %H:%M:%S')`
- C) `df['time_col'].to_iso_datetime()`
- D) `df['time_col'].dt.strftime('%Y-%m-%d %H:%M:%S')`


## Suppose you have a pandas DataFrame `df` with columns `A`, `B`, and `C`. Which of the following code snippets would create a new DataFrame `new_df` containing only the rows where the value in column `A` is greater than 10 and the value in column `B` is less than 5?

- A) `new_df = df[df['A'] > 10 and df['B'] < 5]`
- B) `new_df = df[(df['A'] > 10) & (df['B'] < 5)]` ***
- C) `new_df = df[df['A'] > 10 || df['B'] < 5]`
- D) `new_df = df[(df['A'] > 10) and (df['B'] < 5)]`

## Suppose you have a pandas DataFrame `df` with a column `age`. Which of the following code snippets would create a new DataFrame `new_df` containing only the rows where the age is greater than or equal to 18?

- A) `new_df = df[df.age >= 18]` ***
- B) `new_df = df[df['age'] > 18]`
- C) `new_df = df[df.age == 18]`
- D) `new_df = df[df.age != 18]`

## Suppose you have a pandas DataFrame `df` with columns `name`, `age`, and `gender`. Which of the following code snippets would create a new DataFrame `new_df` with the index set to the `name` column and the `gender` column dropped?

- A) `new_df = df.set_index('name').drop('gender', axis=1)` ***
- B) `new_df = df.drop('gender', axis=1).set_index('name')`
- C) `new_df = df.set_index('name', 'gender').drop('gender', axis=1)`
- D) `new_df = df.set_index(['name'], drop=True)[['age']]`

## Suppose you have a pandas DataFrame `df` with an index set to the `name` column and a column `age`. Which of the following code snippets would reset the index of `df` to a default integer index while keeping the `name` column?

- A) `df.reset_index(inplace=True)` ***
- B) `df.reset_index(inplace=True, col='name')`
- C) `df.reset_index(inplace=True, columns=['name'])`
- D) `df.reset_index(drop=True, inplace=True)`