### Data Slicing 
Data slicing is a concept used in machine learning to divide a dataset into subsets for various purposes such as model validation, testing, or exploration. The idea behind data slicing is to partition your dataset in a way that allows you to perform different analyses on different parts of the data

**Here are some common types of data slicing:**

- Train-Test Split: This is one of the most basic forms of data slicing. The dataset is divided into two parts: a training set and a testing set. The model is trained on the training set and then evaluated on the testing set to assess its performance.

- Cross-Validation: Instead of just one train-test split, cross-validation involves splitting the dataset into multiple subsets (folds). The model is trained on several combinations of these subsets and evaluated on the remaining parts. This helps in getting a more reliable estimate of the model's performance.

- Time-Based Slicing: In cases where the dataset has a temporal component (e.g., time series data), it's common to slice the data based on time intervals. For example, you might use data from the past to predict future outcomes. This ensures that the model is tested on unseen future data, which is crucial for assessing its real-world performance.

- Stratified Sampling: When dealing with imbalanced datasets (where one class is much more prevalent than others), it's important to ensure that each subset maintains the same class distribution as the original dataset. Stratified sampling ensures that each subset contains a representative sample of the overall data.

- Feature-based Slicing: Sometimes, you might want to slice the data based on specific features. For example, you might want to analyze how the model performs on different subsets of data based on certain characteristics.

**How**

you should use unit tests. Just as with overall model performance, checking performance on slices,  Unit tests that are run automatically means we will never accidentally deploy a model that underperforms on a slice that previously performed well.


### Data bias
Data bias in machine learning refers to the presence of skewed or unrepresentative data that can lead to inaccuracies or unfairness in the model's predictions. It occurs when the training data does not accurately reflect the real-world distribution of the data, leading the model to learn and perpetuate the biases present in the training data.

**Types of Data Bias:**

- Selection Bias: Occurs when certain groups or types of data are overrepresented or underrepresented in the training dataset compared to the population it aims to represent.
- Sampling Bias: Arises when the data collection process systematically excludes certain groups or includes disproportionate samples from different populations.
- Label Bias: Happens when the labels or annotations assigned to the data are inaccurate, incomplete, or subjective, leading to biased learning.
- Historical Bias: Reflects biases present in historical data that may not accurately represent the current or desired state of affairs, potentially perpetuating unfairness or inequalities.

**Consquences**

- Unfair Predictions: Biased models may produce predictions that disproportionately favor or disadvantage certain groups, leading to unfair outcomes.
- Inaccurate Generalization: Models trained on biased data may generalize poorly to new or diverse data, leading to unreliable predictions in real-world scenarios.
-Reinforcement of Stereotypes: Biased models can reinforce existing stereotypes or prejudices present in the training data, perpetuating social biases and inequalities.

**Addressing Data Bias:**

- Data Preprocessing: Techniques such as data cleaning, outlier detection, and data augmentation can help mitigate bias in the training data.
- Bias Detection: Tools and metrics are available to identify and quantify bias in the training data, allowing practitioners to understand and address potential sources of bias.
- Fairness-aware Modeling: Researchers and practitioners are developing algorithms and techniques that explicitly account for fairness and mitigate bias in machine learning models.


There are a growing number of tools to classify, understand, and mitigate data bias such as What-If Tool, FairLearn, FairML, and Aequitas.

### Model Card

A model card is a document that provides important information about a machine learning model, including its intended use, performance characteristics, potential limitations, and ethical considerations. Model cards are intended to promote transparency, accountability, and responsible use of machine learning models by providing stakeholders with comprehensive information about the model's development, deployment, and impact.

Here are some key components typically included in a model card:

- **Model Overview**: A brief description of the model, including its purpose, functionality, and intended application. This section provides context for understanding the model's capabilities and limitations.

- **Model Details**: Technical information about the model architecture, algorithms, and parameters used in its implementation. This section may include details such as input and output formats, preprocessing steps, and training procedures.

- **Performance Metrics**: Quantitative measures of the model's performance, such as accuracy, precision, recall, F1 score, or area under the ROC curve. Performance metrics provide insights into the model's effectiveness in making predictions and its overall reliability.

- **Dataset Information**: Details about the training data used to develop the model, including data sources, collection methods, preprocessing steps, and potential biases or limitations. Understanding the dataset is essential for assessing the generalizability and fairness of the model's predictions.

- **Evaluation Methodology**: Description of the evaluation process used to assess the model's performance, including validation techniques, cross-validation procedures, and metrics used for evaluation. Transparent evaluation methodologies help stakeholders understand the reliability and validity of the model's predictions.

- **Ethical Considerations**: Discussion of potential ethical implications, biases, or societal impacts associated with the model's use. This section may address issues such as fairness, privacy, security, and potential misuse of the model's predictions.

- **Use Cases and Examples**: Real-world examples or case studies demonstrating the model's application in different scenarios. Use cases help stakeholders understand how the model can be used effectively and responsibly in practice.

- **Limitations and Caveats**: Disclosure of the model's limitations, uncertainties, and areas where it may not perform well. Transparently communicating limitations helps manage expectations and informs decision-making about the model's deployment and use.

- **References and Citations**: Citations to relevant research papers, datasets, methodologies, and external resources used in the development and evaluation of the model. Providing references allows stakeholders to access additional information and validate the model's claims.

Example 
![image.png](attachment:3c359683-886c-4bdd-a91b-035c014d6893.png)

### CI/CD

**Continuous Integration**

Continuous Integration (CI) is a software development practice where developers frequently integrate their code changes into a shared repository, such as a version control system like Git. Each integration triggers an automated build process, during which the code is compiled, tested, and analyzed for errors or issues. The primary goal of CI is to detect and address integration errors early in the development cycle, ensuring that the codebase remains stable and reliable.

### API 
There are some components in the api you should know When you create one using fast api

**Endpoint**:

This is the URL where your API can be accessed. It's the entry point to interact with your API.

*Example: https://api.example.com/todos*

**HTTP Method (Verb):**
It indicates the type of action the request wishes to perform on the resource. Common HTTP methods are GET, POST, PUT, DELETE, etc.

- GET: Retrieve data.
- POST: Create new data.
- PUT: Update existing data.
- DELETE: Delete existing data.

**Parameters**:
These are additional data sent with the request, typically as query parameters for GET requests and as part of the request body for other methods.

Example:

- Query Parameter: https://api.example.com/todos?status=completed
- Request Body Parameter (for POST and PUT requests):
```json
{
  "title": "Finish homework",
  "due_date": "2024-02-15",
  "status": "pending"
}
```

- Path:
This is the part of the URL that comes after the domain and before any query parameters. It defines the resource or endpoint being accessed.

Example: In /todos/{id}, {id}

**To start the api application : ```uvicorn main:app --reload```**

**By default, our app will be available locally at http://127.0.0.1:8000(opens in a new tab).**

**FastAPI will automatically generate API interactive documents from your codes. To access the API docs, go to http://127.0.0.1:8000/docs**

**If you use pydantic object , it will be used as a body parameter , and you can use it in post request**

**If you use default types it will be used as a query parameter**

**Here are some examples**


In [None]:
pip install fastapi
pip install "uvicorn[standard]"

from fastapi import FastAPI

# Instantiate the app.
app = FastAPI()

# Define a GET on the specified endpoint.
@app.get("/")
async def say_hello():
    return {"greeting": "Hello World!"}

#Example of path parameter and query paramter
@app.get("/items/{item_id}")
async def get_items(item_id: int, count: int = 1):
    return {"fetch": f"Fetched {count} of {item_id}"}

#Example of body  parameter
# This allows sending of data (our TaggedItem) via POST to the API.
@app.post("/items/")
async def create_item(item: TaggedItem):
    return item

In [None]:
# Unit testing apis 
class Value(BaseModel):
    value: int

# Use POST action to send data to the server
@app.post("/{path}")
async def exercise_function(path: int, query: int, body: Value):
    return {"path": path, "query": query, "body": body}

#######################################################################3

import json
from fastapi.testclient import TestClient

from bar import app

client = TestClient(app)


def test_post():
    data = json.dumps({"value": 10})
    r = client.post("/42?query=5", data=data)
    assert r.status_code == 200
    print(r.json())
    assert r.json()["path"] == 42
    assert r.json()["query"] == 5
    assert r.json()["body"] == {"value": 10}

#You can also use request module 
r = requests.post('/url/to/query/', auth=('usr', 'pass'), data=json.dumps(data))


In [None]:
#You can validate input data from users using pydantic validator
Explain
from enum import Enum
from pydantic import BaseModel, validator


class Profession(str, Enum):
   DS = "data scientist"
   MLE = "machine learning scientist"
   RS = "research scientist"

class NewHire(BaseModel):
    profession: Profession
    name: str
    
    @validator('name')
    def name_must_contain_space(cls, v):
        if ' ' not in v:
            raise ValueError('Name must contain a space for first and last name.')
        return v
