#### Question 1
##### Install uv
##### What's the version of uv you installed?
##### Use --version to find out

In [1]:
# uv version
!uv --version

uv 0.9.5


#### Initialize an empty uv project
#### You should create an empty folder for homework and do it there.

### Question 2
#### Use uv to install Scikit-Learn version 1.6.1
#### What's the first hash for Scikit-Learn you get in the lock file?
#### Include the entire string starting with sha256:, don't include quotes

#### sha256:3faa5c39054b2f03ca547da9b2f52fde67c06240c31853f306aea97f13647b55


### Models
#### We have prepared a pipeline with a dictionary vectorizer and a model.

#### It was trained (roughly) using this code:


In [2]:
"""
categorical = ['lead_source']
numeric = ['number_of_courses_viewed', 'annual_income']

df[categorical] = df[categorical].fillna('NA')
df[numeric] = df[numeric].fillna(0)

train_dict = df[categorical + numeric].to_dict(orient='records')

pipeline = make_pipeline(
    DictVectorizer(),
    LogisticRegression(solver='liblinear')
)

pipeline.fit(train_dict, y_train)
"""

"\ncategorical = ['lead_source']\nnumeric = ['number_of_courses_viewed', 'annual_income']\n\ndf[categorical] = df[categorical].fillna('NA')\ndf[numeric] = df[numeric].fillna(0)\n\ntrain_dict = df[categorical + numeric].to_dict(orient='records')\n\npipeline = make_pipeline(\n    DictVectorizer(),\n    LogisticRegression(solver='liblinear')\n)\n\npipeline.fit(train_dict, y_train)\n"

#### Note: You don't need to train the model. This code is just for your reference.

##### And then saved with Pickle. Download it here.

#### With wget:

##### wget https://github.com/DataTalksClub/machine-learning-zoomcamp/raw/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin


### Question 3
#### Let's use the model!
#### Write a script for loading the pipeline with pickle
#### Score this record:
#### {
####    "lead_source": "paid_ads",
####    "number_of_courses_viewed": 2,
####    "annual_income": 79276.0
#### }
##### What's the probability that this lead will convert?


In [3]:
!wget https://github.com/DataTalksClub/machine-learning-zoomcamp/raw/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin -O pipeline_v1.bin


--2025-10-28 10:25:35--  https://github.com/DataTalksClub/machine-learning-zoomcamp/raw/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin
Resolving github.com (github.com)... 20.26.156.215
Connecting to github.com (github.com)|20.26.156.215|:443... connected.
HTTP request sent, awaiting response... 

302 Found
Location: https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin [following]
--2025-10-28 10:25:35--  https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1300 (1.3K) [application/octet-stream]
Saving to: ‘pipeline_v1.bin’


2025-10-28 10:25:35 (76.3 MB/s) - ‘pipeline_v1.bin’ saved [1300/1300]



In [4]:
import pickle
from fastapi import FastAPI
from pydantic import BaseModel
import requests

In [5]:


with open("pipeline_v1.bin", "rb") as f:
    model = pickle.load(f)


In [None]:
# Define the record to score
record = {
    "lead_source": "paid_ads",
    "number_of_courses_viewed": 2,
    "annual_income": 79276.0
}

# Get prediction probability
prediction = model.predict_proba([record])[0, 1]  
print(f"Probability: {prediction:.3f}")
percentage = prediction * 100
print(f"The probability that this lead will convert is {percentage:.2f}%")


Probability: 0.534
The probability that this lead will convert is 53.36%


### Question 4
#### Now let's serve this model as a web service

#### Install FastAPI
#### Write FastAPI code for serving the model
#### Now score this client using requests:


In [9]:


# Define input schema
class Client(BaseModel):
    lead_source: str
    number_of_courses_viewed: int
    annual_income: float

# Load the trained model
with open("pipeline_v1.bin", "rb") as f_in:
    model = pickle.load(f_in)

app = FastAPI()

@app.get("/")
def root():
    return {"message": "Model is live and ready!"}

@app.post("/clientpredict")
def predict(client: Client):
    client_dict = client.dict()
    prediction = model.predict_proba([client_dict])[0, 1]
    return {"subscription_probability": round(float(prediction), 3)}


#### What's the probability that this client will get a subscription?

In [10]:
url = "http://127.0.0.1:8000/predict"
client = {
    "lead_source": "organic_search",
    "number_of_courses_viewed": 4,
    "annual_income": 80304.0
}

response = requests.post(url, json=client)
print(response.json())


{'probability': 0.5340417283801275}


### Docker
#### Install Docker. We will use it for the next two questions.

#### For these questions, we prepared a base image: agrigorev/zoomcamp-model:2025. You'll need to use it (see Question 5 for an example).

#### This image is based on 3.13.5-slim-bookworm and has a pipeline with logistic regression (a different one) as well a dictionary vectorizer inside.



In [None]:
# This is how the Dockerfile for this image looks like:
"""
FROM python:3.13.5-slim-bookworm
WORKDIR /code
COPY pipeline_v2.bin .
"""
# We already built it and then pushed it to agrigorev/zoomcamp-model:2025.
# Note: You don't need to build this docker image, it's just for your reference.

'\nFROM python:3.13.5-slim-bookworm\nWORKDIR /code\nCOPY pipeline_v2.bin .\n'

### Question 5
#### Download the base image "agrigorev/zoomcamp-model:2025". You can easily make it by using "docker pull" command.
#### So what's the size of this base image?
#### You can get this information when running docker images - it'll be in the "SIZE" column.

In [None]:
# docker pull agrigorev/zoomcamp-model:2025

In [None]:
!docker images

REPOSITORY                 TAG       IMAGE ID       CREATED          SIZE
zoomcamp-app               latest    6e5685d7fd28   33 minutes ago   947MB
agrigorev/zoomcamp-model   2025      4a9ecc576ae9   7 days ago       121MB


### Dockerfile
#### Now create your own Dockerfile based on the image we prepared.
#### It should start like that:

#### "FROM agrigorev/zoomcamp-model:2025
####  add your stuff here"
#### Now complete it:

#### Install all the dependencies from pyproject.toml
#### Copy your FastAPI script
#### Run it with uvicorn
#### After that, you can build your docker image.

### Question 6
#### Let's run your docker container!
#### After running it, score this client once again

### What's the probability that this lead will convert?

In [None]:
# Build the Docker image
# docker build -t zoomcamp-app .

In [None]:
# Start the container and map port 8000:
# docker run -it --rm -p 8000:8000 zoomcamp-app

In [None]:
# Send the request

url = "http://localhost:8000/predict"  
client = {
    "lead_source": "organic_search",
    "number_of_courses_viewed": 4,
    "annual_income": 80304.0
}

response = requests.post(url, json=client).json()
print(response)


{'probability': 0.5340417283801275}
