# **Deployment homework**

We recommend using python 3.12 or 3.13 in this homework.

In this homework, we're going to continue working with the lead scoring dataset. You don't need the dataset: we will provide the model for you.

### **<font color='red'>Question 1</font>**
- Install `uv`
- What's the version of uv you installed?
- Use `--version` to find out

In [1]:
!uv --version

uv 0.9.5


### Initialize an empty uv project
You should create an empty folder for homework and do it there.

### **<font color='red'>Question 2</font>**
- Use `uv` to install Scikit-Learn version 1.6.1
- What's the first hash for Scikit-Learn you get in the lock file?
- Include the entire string starting with sha256:, don't include quotes

In [2]:
!awk '/\[\[package\]\]/ {pkg=""} /name = "scikit-learn"/ {pkg=1} pkg && /sha256:/ {print; exit}' uv.lock

sdist = { url = "https://files.pythonhosted.org/packages/9e/a5/4ae3b3a0755f7b35a280ac90b28817d1f380318973cff14075ab41ef50d9/scikit_learn-1.6.1.tar.gz", hash = "sha256:b4fc2525eca2c69a59260f583c56a7557c6ccdf8deafdba6e060f94c1c59738e", size = 7068312, upload-time = "2025-01-10T08:07:55.348Z" }


In [3]:
import re

with open("uv.lock", "r", encoding="utf-8") as f:
    inside = False
    for line in f:
        if line == 'name = "scikit-learn"\n':
            inside = True
            continue
        if inside:
            match = re.search(r"sha256:([a-f0-9]+)", line)
            if match:
                print("sha256:" + match.group(1))
                break

sha256:b4fc2525eca2c69a59260f583c56a7557c6ccdf8deafdba6e060f94c1c59738e


### Models
We have prepared a pipeline with a dictionary vectorizer and a model.

It was trained (roughly) using this code:

```python
categorical = ['lead_source']
numeric = ['number_of_courses_viewed', 'annual_income']

df[categorical] = df[categorical].fillna('NA')
df[numeric] = df[numeric].fillna(0)

train_dict = df[categorical + numeric].to_dict(orient='records')

pipeline = make_pipeline(
    DictVectorizer(),
    LogisticRegression(solver='liblinear')
)

pipeline.fit(train_dict, y_train)
```

> **Note:** You don't need to train the model. This code is just for your reference.

The trained pipeline was saved with **Pickle**.  
You can download it using the following command:

```bash
wget https://github.com/DataTalksClub/machine-learning-zoomcamp/raw/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin
```









### **<font color='red'>Question 3</font>**
Let's use the model!

Write a script for loading the pipeline with pickle  
Score this record:

```python
{
    "lead_source": "paid_ads",
    "number_of_courses_viewed": 2,
    "annual_income": 79276.0
}
```

What's the probability that this lead will convert?

- 0.333  
- <font color='green'>0.533</font> ✅
- 0.733  
- 0.933  

If you're getting errors when unpickling the files, check their checksum:

```
$ md5sum pipeline_v1.bin
7d17d2e4dfbaf1e408e1a62e6e880d49 *pipeline_v1.bin
```

In [4]:
!wget -O pipeline_v1.bin https://github.com/DataTalksClub/machine-learning-zoomcamp/raw/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin

--2025-10-23 11:38:06--  https://github.com/DataTalksClub/machine-learning-zoomcamp/raw/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin
Resolving github.com (github.com)... 140.82.121.4, 64:ff9b::8c52:7904
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin [following]
--2025-10-23 11:38:07--  https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/refs/heads/master/cohorts/2025/05-deployment/pipeline_v1.bin
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1300 (1.3K) [application/octet-stream]
Saving to: ‘pipeline_v1.b

In [5]:
import pickle

model_file = 'pipeline_v1.bin'
with open(model_file, 'rb') as f_in:
    dv, model = pickle.load(f_in)
 
dv, model

(DictVectorizer(), LogisticRegression(solver='liblinear'))

In [7]:
lead = {
    "lead_source": "paid_ads",
    "number_of_courses_viewed": 2,
    "annual_income": 79276.0
}
X = dv.transform([lead])
model.predict_proba(X)[0,1].round(3)

np.float64(0.534)