## Q1. What is data encoding? How is it useful in data science?

Data encoding refers to the process of converting categorical data (non-numerical) into numerical format so that machine learning algorithms, which work with numbers, can use them effectively. In data science, encoding is useful because most machine learning algorithms cannot work directly with categorical data. By transforming categorical values into numerical equivalents, it allows models to process and analyze the data properly, leading to more accurate predictions.
-
---

## Q2. What is nominal encoding? Provide an example of how you would use it in a real-world scenario.

Nominal encoding (also known as Label Encoding) is a technique that assigns a unique integer to each distinct category in a column. It's useful when the categorical data does not have any inherent ordering.

**Example:** 
Suppose you have a dataset of customer feedback where the column "Feedback Type" has three categories: "Positive," "Negative," and "Neutral." Using nominal encoding, you could convert these values to integers, such as:
- Positive: 0
- Negative: 1
- Neutral: 2

This transformation allows algorithms to process the data numerically.

---

## Q3. In what situations is nominal encoding preferred over one-hot encoding? Provide a practical example.

Nominal encoding is preferred over one-hot encoding when:
- The categorical data has a high number of unique categories.
- The data does not have any ordinal relationships.
- There is a risk of high-dimensionality issues caused by creating many binary columns (as in one-hot encoding).

**Example:** 
In a large dataset of a supermarket with 1000 unique product categories, using one-hot encoding would create 1000 binary columns, leading to sparse data. In such cases, nominal encoding (where each product category gets a unique integer) would be more practical and computationally efficient.

---

## Q4. Suppose you have a dataset containing categorical data with 5 unique values. Which encoding technique would you use to transform this data into a format suitable for machine learning algorithms?

For a dataset with only 5 unique values in the categorical data, I would use **One-Hot Encoding** because the number of categories is small. One-Hot Encoding creates separate binary columns for each category, ensuring that no numerical relationship is implied between the categories, which prevents misleading assumptions about the data.

---

## Q5. In a machine learning project, you have a dataset with 1000 rows and 5 columns. Two of the columns are categorical, and the remaining three columns are numerical. If you were to use nominal encoding to transform the categorical data, how many new columns would be created?

Nominal encoding transforms each unique category into an integer without creating additional columns. Therefore, the number of columns would remain the same (5 columns). 

- The dataset has 5 columns (2 categorical + 3 numerical).
- After nominal encoding, you will still have 5 columns because nominal encoding doesn't increase the column count; it just changes the categorical columns to numerical ones.

---

## Q6. You are working with a dataset containing information about different types of animals, including their species, habitat, and diet. Which encoding technique would you use to transform the categorical data into a format suitable for machine learning algorithms? Justify your answer.

I would use **One-Hot Encoding** for transforming categorical features such as species, habitat, and diet, since there are likely a small number of categories, and the relationships between categories are non-ordinal (no inherent order). One-Hot Encoding ensures that the machine learning algorithm does not assume any numerical ranking or order between categories, which is appropriate for animal species or habitats.

---

## Q7. You are working on a project that involves predicting customer churn for a telecommunications company. You have a dataset with 5 features, including the customer's gender, age, contract type, monthly charges, and tenure. Which encoding technique(s) would you use to transform the categorical data into numerical data? Provide a step-by-step explanation of how you would implement the encoding.

1. **Gender (Categorical):** This feature has two categories (e.g., Male, Female). I would use **Label Encoding** or **One-Hot Encoding**. Since there are only two categories, Label Encoding can suffice by assigning 0 and 1, but One-Hot Encoding would also work to avoid introducing numerical bias.

2. **Contract Type (Categorical):** Contract types may include multiple categories such as "Month-to-Month," "One-Year," or "Two-Year." I would use **One-Hot Encoding** to create separate binary columns for each contract type to prevent any ordinal assumption about the contract duration.

3. **Age, Monthly Charges, Tenure (Numerical):** These are already numerical features, so no encoding is required.

**Step-by-step:**
- Apply **Label Encoding** or **One-Hot Encoding** to the `Gender` feature.
- Apply **One-Hot Encoding** to the `Contract Type` feature.
- Leave the `Age`, `Monthly Charges`, and `Tenure` features as they are since they are already numerical.
