# Data: De Superkracht achter AI

This work is licensed under CC BY-NC-SA 4.0 https://creativecommons.org/licenses/by-nc-sa/4.0/

Deze notebook werd ontwikkeld door E. Boonen voor workshops op Thomas More Mechelen. Meer info over onze data-opleiding op https://thomasmore.be/ba-adi-me In de begeleidende workshop & presentatie wordt er ook veel aandacht gevestigd op het ethisch omgaan met deze technologieën!

## Tekst scrapen
### Netflix series bewaren in csv-bestand

In [1]:
import pandas as pd

url = "https://nl.wikipedia.org/wiki/Netflix"
tables = pd.read_html(url)

# Lijst om de gewenste tabellen op te slaan
data = []
columns = ["Titel", "Genre", "Première"]

for df in tables:
    # Controleren of de tabel de juiste kolommen bevat
    if all(col in df.columns for col in columns):
        df = df[columns]  # Alleen de relevante kolommen behouden
        data.append(df)

final_data = pd.concat(data, ignore_index=True)
final_data.to_csv("netflix_series.csv", index=False)
print("Alle series opgeslagen!")


Alle series opgeslagen!


### Ratings toevoegen via API

API-key aanvragen via https://www.omdbapi.com/ 

In [2]:
import pandas as pd
import omdb

omdb.set_default('apikey', '11485f27') #API key limiet 1k/dag

csv_filename = "netflix_series.csv"
df = pd.read_csv(csv_filename)

imdb_ratings = []
for index, row in df.iterrows():
    titel = row['Titel']
    try:
        result = omdb.get(title=titel)
        rating = result.get('imdb_rating', 'N/A')
    except Exception as e:
        rating = 'N/A'
    
    print(f"{titel}: IMDb Rating - {rating}")
    imdb_ratings.append(rating)

df['IMDb Rating'] = imdb_ratings

updated_csv_filename = "netflix_series_with_ratings.csv"
df.to_csv(updated_csv_filename, index=False)

print(f"Ratings toegevoegd")


House of Cards: IMDb Rating - N/A
Hemlock Grove: IMDb Rating - N/A
Orange Is the New Black: IMDb Rating - N/A
Marco Polo: IMDb Rating - N/A
Bloodline: IMDb Rating - N/A
Daredevil: IMDb Rating - N/A
Sense8: IMDb Rating - N/A
Narcos: IMDb Rating - N/A
Jessica Jones: IMDb Rating - N/A
Stranger Things: IMDb Rating - N/A
The Get Down: IMDb Rating - N/A
Luke Cage: IMDb Rating - N/A
The Crown: IMDb Rating - N/A
A Series of Unfortunate Events: IMDb Rating - N/A
Iron Fist: IMDb Rating - N/A
13 Reasons Why: IMDb Rating - N/A
Ozark: IMDb Rating - N/A
The Defenders: IMDb Rating - N/A
Mindhunter: IMDb Rating - N/A
The Punisher: IMDb Rating - N/A
Godless: IMDb Rating - N/A
Lost in Space: IMDb Rating - N/A
The Haunting of Hill House: IMDb Rating - N/A
Chilling Adventures of Sabrina: IMDb Rating - N/A
Narcos: Mexico: IMDb Rating - N/A
The Umbrella Academy: IMDb Rating - N/A
The Society: IMDb Rating - N/A
When They See Us: IMDb Rating - N/A
Another Life: IMDb Rating - N/A
The Dark Crystal: Age of Resis

### Aanbeveling doen op basis van genre & rating

In [5]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics.pairwise import euclidean_distances

df = pd.read_csv("netflix_series_with_ratings.csv")

title = "Daredevil" # deze film vonden we goed! (Daredevil, Dark, Klaus, Eli)

movie = df[df['Titel'] == title].iloc[0]
same_genre = df[df['Genre'] == movie['Genre']].copy() # filter films met zelde genre

# Als er geen films met exact hetzelfde genre zijn, neem alle films
if same_genre.empty:
    same_genre = df.copy()

# IMDb-rating normaliseren (schaal 0-1) zodat afstandsberekeningen correct werken
scaler = MinMaxScaler()
same_genre['scaled_rating'] = scaler.fit_transform(same_genre[['IMDb Rating']])
movie_vector = [[movie['IMDb Rating']]]  
distances = euclidean_distances(movie_vector, same_genre[['scaled_rating']])  

# Voeg de afstand toe aan de dataset en sorteer op kleinste afstand
same_genre['distance'] = distances[0]
recommendations = same_genre.sort_values(by='distance')[1:6]  # Eerste film overslaan
print(recommendations[['Titel', 'Genre', 'IMDb Rating']])


            Titel          Genre  IMDb Rating
19   The Punisher  Marvel Comics          8.4
8   Jessica Jones  Marvel Comics          7.8
11      Luke Cage  Marvel Comics          7.2
17  The Defenders  Marvel Comics          7.2
14      Iron Fist  Marvel Comics          6.4


## Beelden scrapen
### Alle webcambeelden downloaden op computer 

https://www.meteoblue.com/ --> openbare webcambeelden uit Brussel, Parijs, New York, Amsterdam enz... (via zoeken)

In [6]:
import requests
from bs4 import BeautifulSoup

url = "https://www.meteoblue.com/nl/weer/webcams/brussel_belgi%C3%AB_2800866"

response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
images = soup.find_all('img')

count = 0

for img in images:
    url = img["src"]
    filename = "img/meteoblue_" + str(count) + ".jpg"
    r = requests.get(url, allow_redirects=True)
    with open(filename, 'wb') as file:
        file.write(r.content)

    count += 1

### Gebruik objectherkenning

In [8]:
import cv2
import numpy as np
import os

# **Bestanden inladen**
config_path = "yolov3.cfg"
weights_path = "yolov3.weights"
labels_path = "coco.names"

# **COCO-labels laden**
with open(labels_path, "r") as f:
    labels = [line.strip() for line in f.readlines()]

# **YOLO-model laden**
net = cv2.dnn.readNetFromDarknet(config_path, weights_path)
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)

# **Afbeelding laden**
image_path = "img/meteoblue_2.jpg"
image = cv2.imread(image_path)
height, width = image.shape[:2]

# **YOLO pre-processing**
blob = cv2.dnn.blobFromImage(image, 1/255.0, (416, 416), swapRB=True, crop=False)
net.setInput(blob)

# **Output layers ophalen**
layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]

# **Detectie uitvoeren**
detections = net.forward(output_layers)

# **Bounding boxes en objecten verzamelen**
boxes = []
confidences = []
class_ids = []
object_counts = {}

for output in detections:
    for detection in output:
        scores = detection[5:]  # Class scores
        class_id = np.argmax(scores)
        confidence = scores[class_id]

        if confidence > 0.5:  # Drempelwaarde voor detectie
            box = detection[0:4] * np.array([width, height, width, height])
            (centerX, centerY, w, h) = box.astype("int")

            x = int(centerX - (w / 2))
            y = int(centerY - (h / 2))

            boxes.append([x, y, int(w), int(h)])
            confidences.append(float(confidence))
            class_ids.append(class_id)

            # **Object tellen**
            label = labels[class_id]
            object_counts[label] = object_counts.get(label, 0) + 1

# **Niet-maximale suppressie toepassen om dubbele detecties te vermijden**
indices = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)

# **Kleuren voor de bounding boxes**
colors = np.random.uniform(0, 255, size=(len(labels), 3))

# **Detecties tekenen**
if len(indices) > 0:
    for i in indices.flatten():
        (x, y, w, h) = boxes[i]
        color = [int(c) for c in colors[class_ids[i]]]
        label = f"{labels[class_ids[i]]}: {confidences[i]:.2f}"

        # **Bounding box tekenen**
        cv2.rectangle(image, (x, y), (x + w, y + h), color, 2)
        cv2.putText(image, label, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

# **Afbeelding opslaan en tonen**
output_image_path = "output_detected.jpg"
cv2.imwrite(output_image_path, image)

# **Gevonden objecten printen**
print("Objecten:")
for obj, count in object_counts.items():
    print(f"{count}x {obj}")


Objecten:
28x person
