# AI In Web

- Notebook Auth: Ahmed Métwalli

In this notebook, the whole content of AI in Web is prepared, make sure to resolve the tutorial each section.
I recommend you create a seperate environment for this course with Python version 3.11.x

# Section 1.1: Common Regex Practices

### Regex Hands-on

Common Regex Patterns (https://docs.python.org/3/library/re.html):
-        . - Matches any character except a newline.
-        ^ - Matches the start of the string.
-        $ - Matches the end of the string.
-        * - Matches 0 or more repetitions of the preceding element.
-        + - Matches 1 or more repetitions of the preceding element.
-        ? - Matches 0 or 1 repetition of the preceding element.
-        {n} - Matches exactly n repetitions of the preceding element.
-        {n,} - Matches n or more repetitions of the preceding element.
-        {n,m} - Matches between n and m repetitions of the preceding element.
-        [] - Matches any one of the enclosed characters.
-        | - Alternation; matches either the pattern before or after the |.
-        () - Groups multiple patterns into one.
-        \d: Matches any digit (equivalent to [0-9]).
-        \D: Matches any non-digit character.
-        \s: Matches any whitespace character (spaces, tabs, newlines).
-        \S: Matches any non-whitespace character.
-        \b: Matches a word boundary (the position between a word and a non-word character).
-        \B: Matches a non-word boundary.
-        \w: Matches any word character (alphanumeric plus underscore).
-        \W: Matches any non-word character.

- Practice REGEX: https://regex101.com/r/rsVgaP/1

In [None]:
import re
import pandas as pd
import numpy as np


In [None]:
# Example strings
text = "You should call 911 now. 911 is the emergency number"


In [None]:
# Find all numbers in the text
pattern = ...
matches = ...
print(f"Numbers found: {matches}")

Numbers found: ['123', '456']


In [None]:
# Replace all numbers with the word '[NUM]'
pattern = ...
replaced_text = ...
print(replaced_text)

The quick brown fox jumps over the lazy dog 999 999.


In [None]:
# Extract Email 
data = {'emails': ['john.doe@example.com', 'jane_smith@abc.co.uk', 'invalid.email@com']}
df = pd.DataFrame(data)

# Extract the username separately
# ^: Start of the string.
# ([\w.]+): Captures the username part of the email.
# [\w.]: Matches any word character (letters, digits, underscores), plus the special characters . % + -.
# +: One or more of the preceding characters.
# @: Matches the literal @ symbol, which is required to separate the username and domain.
df['username'] = ... # using df[col].str.extract

# Extract the domain separately
# @: Matches the literal @ symbol, which precedes the domain part.
# ([\w.-]+\.[a-zA-Z]{2,}): Captures the domain part of the email.
# [\w.-]+: Matches the main domain part, including letters, digits, hyphens, and dots.
# \.[a-zA-Z]{2,}: Matches the top-level domain (TLD) with at least two letters.
# $: End of the string.
df['domain'] = ... # using df[col].str.extract
df


Unnamed: 0,emails,username,domain
0,john.doe@example.com,john.doe,example.com
1,jane_smith@abc.co.uk,jane_smith,abc.co.uk
2,invalid.email@com,invalid.email,


In [None]:
# Email Validation
# In real cases: Email usernames do no have %+-
def validate_email(email):
    pattern = ...
    # ^(?!.*\.\.): Negative lookahead to ensure there are no consecutive dots in the email string.
    # [a-zA-Z0-9._%+-]+: Matches the local part (username) of the email. Allows letters, digits, and special characters such as ., _, %, +, -.
    # @[a-zA-Z0-9.-]+: Matches the domain part. Allows letters, digits, hyphens, and dots. This pattern allows a single dot but not consecutive dots within the domain part.
    # \.[a-zA-Z]{2,6}$: Matches the TLD with 2 to 6 alphabetic characters, which covers most common TLDs like .com, .org, .museum, etc.
    return bool(re.match(pattern, email))
# Test the refined function
emails = ['test.email@example.com', 'invalid-email@.com', 'name@domain.co', 'test..email@example.com', 'test@domain.c', 'test@domain.toolongtld']
results = [validate_email(email) for email in emails]
print(f"Validation Results: {results}")



Validation Results: [True, False, True, False, False, False]


In [None]:
# Phone Number Validation
def validate_phone_number(number):
    # Pattern to match common phone number formats
    pattern = ...

    # ^: Start of the string.
    # (\+\d{1,3}[-.\s]?)?: Matches the optional country code part.
        # \+: Matches a literal plus sign '+' at the start, indicating an international code.
        # \d{1,3}: Matches 1 to 3 digits for the country code (e.g., '1' for the US, '44' for the UK).
        # [-.\s]?: Matches an optional separator, which can be a hyphen '-', a dot '.', or a space ' '.
        # ?: Makes the entire country code part optional.
    # (\(?\d{3}\)?[-.\s]?)?: Matches the optional area code part.
        # \(?\d{3}\)?: Matches 3 digits for the area code, which may or may not be enclosed in parentheses. 
            # - \(? : Matches an optional opening parenthesis '('.
            # - \d{3}: Matches exactly 3 digits for the area code.
            # - \)?: Matches an optional closing parenthesis ')'.
        # [-.\s]?: Matches an optional separator (hyphen, dot, or space).
            # ?: Makes the entire area code part optional.
    # (\d{3}[-.\s]?\d{4}): Matches the main phone number part.
        # \d{3}: Matches exactly 3 digits.
        # [-.\s]?: Matches an optional separator (hyphen, dot, or space).
        # \d{4}: Matches exactly 4 digits for the remaining part of the phone number.
    # $: End of the string. Ensures that the pattern matches the entire phone number from start to end.
    
    return bool(re.fullmatch(pattern, number))


# Test the function with a list of phone numbers
numbers = ['+1-800-555-5555',  # Valid: Includes country code and separators.
           '(123) 456 7890',   # Valid: Area code in parentheses and spaces as separators.
           '12345']            # Invalid: Too short to be a valid phone number.

# Validate each phone number using the function
results = [validate_phone_number(number) for number in numbers]

# Display the validation results for each phone number
print(f"Validation Results: {results}")


Validation Results: [True, True, False]


In [None]:
# Extract URLs
# Extracting URLs from the given text
text = 'Visit our website at https://www.example.com or follow us at http://blog.example.com'

# Define the pattern to match URLs
pattern = ...

# https?://: 
# - https?: Matches the literal 'http' followed optionally by 's'. This means it can match both 'http' and 'https'.
# - ://: Matches the literal characters '://', which are required after 'http' or 'https' in a URL.

# [a-zA-Z0-9./-]+:
# - [a-zA-Z0-9./-]: Character set that matches any of the following characters:
#   - a-z: Lowercase English letters.
#   - A-Z: Uppercase English letters.
#   - 0-9: Digits.
#   - . (dot): Matches the literal dot, which is used in domain names and paths.
#   - / (forward slash): Matches the literal slash, which is used to separate different parts of the URL.
#   - - (hyphen): Matches the literal hyphen, which can be part of domain names or paths.
# - +: Quantifier that matches one or more of the preceding characters in the set, ensuring the pattern matches the entire URL.

urls = re.findall(pattern, text)

print(f"Extracted URLs: {urls}")


Extracted URLs: ['https://www.example.com', 'http://blog.example.com']


In [None]:
# Extract Birthday
# Sample text containing dates
text = "John's birthday is on 23/05/1995 and Mary's is on 15-04-1992."

# Define the pattern to match date formats
pattern = ...

# \b: Matches a word boundary, ensuring that the pattern matches whole numbers and not parts of larger strings.
    # - This prevents partial matches like '123' in '123abc'.
# \d{1,2} or \d{2,4}: 
# - \d: Matches any digit from 0 to 9.
# - {1,2}: Matches lower or upper digits for the day or month part, allowing for numbers like '3' or '23'.
# [-/]: - Matches either a hyphen '-' or a forward slash '/', which are common separators in date formats.

dates = re.findall(pattern, text)

print(f"Extracted Dates: {dates}")


Extracted Dates: ['23/05/1995', '15-04-1992']


In [None]:
# Splitting text to be split into sentences
text = "Hello there! How are you today? Let's learn regex."

# Define the pattern to split the text into sentences
pattern = ...

# Explanation of the pattern:
# [.!?]:
# - [ ]: Square brackets define a character class, which matches any one of the enclosed characters.
# - .: Matches a literal period (.) which marks the end of a sentence.
# - !: Matches a literal exclamation mark (!) which marks the end of an exclamatory sentence.
# - ?: Matches a literal question mark (?) which marks the end of a question.

sentences = re.split(pattern, text)
sentences = [sentence.strip() for sentence in sentences if sentence.strip()]

print(f"Sentences: {sentences}")


Sentences: ['Hello there', 'How are you today', "Let's learn regex"]


# Section 1.2: Web Scraping 

Static, Dynamic and APIs:

Static Scraping: This involves extracting data from websites with static content, meaning the data is loaded directly in the HTML and doesn't change after the page loads. Tools:
- requests: To send HTTP requests and fetch webpage content.
- BeautifulSoup: From the bs4 library, used to parse and extract data from HTML documents.
- Example: Scraping a blog page to collect article titles.

Dynamic Scraping: Involves scraping websites where content is loaded dynamically using JavaScript, such as infinite scrolling or loading data after user actions. 
Tools:
- Selenium: Automates browsers, ideal for interacting with dynamic content.
- Playwright: A newer alternative to Selenium that allows for scraping JavaScript-heavy websites efficiently.
Example: Scraping a news website where articles load as you scroll.

API Scraping: Instead of scraping a website's HTML, you directly interact with a public or private API to retrieve structured data, usually in JSON or XML format. 

Tools:
- requests: Used to send HTTP requests to the API endpoint and fetch the data.
- httpx: A more modern HTTP client offering asynchronous requests.
- or via 'custom packages' for example kaggle has kaggle package, OAuth used by variety of platforms such as LinkedIn


In Summary:

    a. Basic scraping using BeautifulSoup  
    b. Scraping dynamic content using Selenium  
    c. APIs and JSON handling  

## Static Scraping

In [None]:
# Case 1
import requests, re, pandas as pd
from bs4 import BeautifulSoup

# Define the URL of the website to scrape
url = "http://books.toscrape.com/..."

# Send a request to the website
response = ...
soup = BeautifulSoup(...)

# Extract book titles and prices
...
...


In [23]:
# define the target url
URL = "https://www.dailymetalprice.com/metalprices.php"

# get the region of interest
CLASS_NAME = 'rate-table'

# get the response from the URL
page_response = requests.get(URL)

# parse the response using Beautiful Soup
soup = BeautifulSoup(page_response.content, 'html.parser')

# region of interest is tbody and rate-table
tbody = soup.find(id='rate-table')

# Check if the tbody was found
if tbody:
    print(tbody)
    rows = tbody.find_all('tr')  # Find all rows in the tbody
    for row in rows:
        columns = row.find_all('td')  # Extract each column in the row
        data = [col.get_text(strip=True) for col in columns]  # Extract text from each column
        print(data)  # Print or process the row's data
else:
    print("tbody with ID 'rate-table' not found.")

<tbody id="rate-table">
</tbody>


In [None]:
soup # View the soup

## Dynamic Scraping to solve dynamic JavaScript execution on pages

In [None]:

from selenium import webdriver
# from selenium.webdriver.common.action_chains import ActionChains 
# from selenium.webdriver.edge.options import Options
# from selenium.webdriver.common.by import By
# from selenium.webdriver.common.keys import Keys

from bs4 import BeautifulSoup
import time

# Set up Selenium WebDriver (Chrome in this case)
driver = ... # Make sure ChromeDriver is installed

# Open the webpage
URL = "https://www.dailymetalprice.com/metalprices.php"
...

# Allow time for the page to load fully (you can use WebDriverWait for better control)
time.sleep(5)  # You can adjust the sleep time depending on page load speed

# Get the page source after JavaScript execution
page_source = ...

# parse the fully rendered HTML
soup = ...

# Extract tables

# Close the browser after scraping
...


## API Handling

In [None]:
# Google Books API - Enriching book data
api_url = "https://www.googleapis.com/books/v1/volumes"
params = {
    'q': 'mystery',
    #'key': 'AIzaSyDWq81P63e9P0KIIgGo8E94mrnm-xwqrW8'
}

response = ...
books_data = ... # to json()

# Extract relevant information
...

# Convert to DataFrame and merge
api_books_df = pd.DataFrame(...)


In [None]:
api_books_df

Unnamed: 0,Title,Authors,PublishedDate,Description
0,MYSTERY & DETECTIVE COLLECTION: The Winning Cl...,[James Hay],2016-07-17,"This carefully crafted ebook: ""MYSTERY & DETEC..."
1,The S. P. Mystery,[Harriet Pyne Grove],1930-01-01,
2,Fresh Slices,[New York Tri-State Chapter of Sisters in Crim...,2014-08-14,Slices of life beyond the tourist's view. By t...
3,An Act of Villainy,[Ashley Weaver],2018-09-04,Edgar Award-shortlisted author Ashley Weaver r...
4,The Mysterious Affair at Styles: a Hercule Poi...,[Agatha Christie],2021-10-21,Hercule Poirot solves his first case in the Ag...
5,The Snatch,[Bill Pronzini],2011-10,The author's second novel and the first in the...
6,THE CHAMPDOCE MYSTERY,[EMILE GABORIAU],1913,A classic gem of the detective-fiction genre o...
7,Union Jacked,[Diane Vallere],2021-01-04,National bestselling author Diane Vallere brin...
8,They Came for Him,[P.D. Workman],,A mystery thriller from USA Today bestselling ...
9,The Mystery of Three Quarters,[Sophie Hannah],2019-08-06,"The world’s most beloved detective, Hercule Po..."


### API Handling Ex2

In [None]:
# Weather API
import requests,json,pandas as pd

# Params
API_KEY = "d8008f9efc594f90b2c181226240809"
LOCATION = "Alexandria Egypt"

NUM_OF_DAYS = 30
DATE = "2023-04-1"
ENDDATE = "2023-04-30"
URL = "http://api.worldweatheronline.com/premium/v1/past-weather.ashx?key={}&q={}&format=json&date={}&enddate={}&tp=24".\
    format(...)
response = requests.get(URL)

In [None]:
response.text

'{"data":{"request":[{"type":"City","query":"Alexandria, Egypt"}],"weather":[{"date":"2023-04-01","astronomy":[{"sunrise":"05:50 AM","sunset":"06:19 PM","moonrise":"02:02 PM","moonset":"03:29 AM","moon_phase":"Waxing Gibbous","moon_illumination":"74"}],"maxtempC":"22","maxtempF":"72","mintempC":"13","mintempF":"55","avgtempC":"17","avgtempF":"63","totalSnow_cm":"0.0","sunHour":"13.0","uvIndex":"4","hourly":[{"time":"24","tempC":"22","tempF":"72","windspeedMiles":"7","windspeedKmph":"11","winddirDegree":"264","winddir16Point":"W","weatherCode":"119","weatherIconUrl":[{"value":"https://cdn.worldweatheronline.com/images/wsymbols01_png_64/wsymbol_0003_white_cloud.png"}],"weatherDesc":[{"value":"Cloudy"}],"precipMM":"0.0","precipInches":"0.0","humidity":"64","visibility":"10","visibilityMiles":"6","pressure":"1019","pressureInches":"30","cloudcover":"19","HeatIndexC":"18","HeatIndexF":"64","DewPointC":"10","DewPointF":"51","WindChillC":"17","WindChillF":"63","WindGustMiles":"10","WindGustKm

In [None]:
# Parse the JSON data
data = json.loads(response.text)

# Extract relevant weather information
weather_data = data['data']['weather']

# Create a pandas DataFrame
df = pd.DataFrame(weather_data)

In [None]:
df.head()

Unnamed: 0,date,astronomy,maxtempC,maxtempF,mintempC,mintempF,avgtempC,avgtempF,totalSnow_cm,sunHour,uvIndex,hourly
0,2023-04-01,"[{'sunrise': '05:50 AM', 'sunset': '06:19 PM',...",22,72,13,55,17,63,0.0,13.0,4,"[{'time': '24', 'tempC': '22', 'tempF': '72', ..."
1,2023-04-02,"[{'sunrise': '05:49 AM', 'sunset': '06:20 PM',...",25,76,13,55,18,65,0.0,13.0,5,"[{'time': '24', 'tempC': '25', 'tempF': '76', ..."
2,2023-04-03,"[{'sunrise': '05:47 AM', 'sunset': '06:20 PM',...",28,82,15,59,20,69,0.0,13.0,6,"[{'time': '24', 'tempC': '28', 'tempF': '82', ..."
3,2023-04-04,"[{'sunrise': '05:46 AM', 'sunset': '06:21 PM',...",24,75,18,64,20,68,0.0,13.0,5,"[{'time': '24', 'tempC': '24', 'tempF': '75', ..."
4,2023-04-05,"[{'sunrise': '05:45 AM', 'sunset': '06:22 PM',...",23,73,14,58,18,65,0.0,13.0,5,"[{'time': '24', 'tempC': '23', 'tempF': '73', ..."


# Section 2: End-to-End Deployment of a Machine Learning Model with Docker

- Docker? Docker is a set of platform as a service products that use OS-level virtualization to deliver software in packages called containers.
    - Docker Components:
        - Client: The interface to interact with Docker. It sends commands to the Docker daemon (e.g., building images, running containers).
        - Daemon: The server-side component of Docker, responsible for managing Docker objects (images, containers, networks, volumes).
        - Container: A lightweight, portable, and isolated execution environment for the application.
        - Image: A read-only template with the application and its dependencies, used to create containers.
        - Dockerfile: A script that defines the instructions to build a Docker image.
        - Network: Facilitates communication between Docker containers and external networks.
        - Volume: Allows for persistent storage, even if containers are stopped or removed.
        - Registry: A repository to store and distribute Docker images (e.g., Docker Hub).
        - Host: The machine where Docker daemon and containers run.
        - Plugins: Extensions to enhance Docker's functionality (e.g., for monitoring, logging).

<!DOCTYPE html>
<div class="image-container">
  <img src="docker_components.png"/>
</div>
<style>
.image-container {
  display: flex;
  justify-content: center;
  </style>
}

## 1. Setup the Project Directory

#### Organize your project directory as follows:
                    my_ml_project/
                    |-- Dockerfile
                    |-- fastapi-app/
                    |   |-- main.py
                    |   |-- model.h5
                    |-- requirements.txt


## 2. Create the Dockerfile
#### A Dockerfile is a script that contains instructions to assemble a Docker image. It defines the base image, dependencies, and other configurations needed to set up the environment.
- Note: The WORKDIR instruction sets the working directory for any RUN, CMD, ENTRYPOINT, COPY and ADD instructions that follow it in the Dockerfile. If the WORKDIR doesn’t exist, it will be created even if it’s not used in any subsequent Dockerfile instruction.
```
Dockerfile

# base image of python 3.11
FROM python:3.11-slim

# Working directory
WORKDIR /fastapi-app

# Copying the current directory into container
COPY . /fastapi-app

# Installing dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Expose the port that FastAPI server will run on
EXPOSE 80

# CMD Pipeline running the server
CMD ["uvicorn", "fastapi-app.main:app","--host","0.0.0.0", "--port", "80", "--reload"]


# docker build -t fastapi-app .
# docker run -d -p 8000:80 -v ${pwd}:/fastapi-app fastapi-app


```

## 3. Create the Requirements File

#### requirements.txt lists all the Python dependencies required for the application. You can generate it using:
```
pip freeze > requirements.txt
```

Example of requirements.txt:
```
fastapi
uvicorn[standard]
```



## 4. Application Code (app/main.py)
#### This script loads the trained model and handles HTTP requests.
```
python
from fastapi import FastAPI # instance of fast api
import tensorflow as tf
import numpy as np

# Load the trained model
model = tf.keras.models.load_model('model.h5')

app = FastAPI() # App is now an instance of FastAPI()

@app.route('/predict', methods=['POST'])
def predict():
    data = request.json
    input_data = np.array(data['input']).reshape(1, -1)
    prediction = model.predict(input_data)
    return jsonify({'prediction': prediction.tolist()})

```

## 5. Building and Running the Docker Image
#### 1. Build the Docker Image:
```
    docker build -t fastapi-app .
```

- Client: Sends the build command to the Docker daemon.
- Daemon: Builds the image according to the instructions in Dockerfile
#### 2. Run the Docker Container:
```
    docker run -d -p 8000:80 -v ${pwd}:/fastapi-app fastapi-app
```

- Container: A running instance of the Docker image, which contains the application and its dependencies.
- Network: The container is connected to a network, allowing communication between containers or external services.
- Volume: If needed, you can mount volumes for persistent storage (e.g., to save predictions).
- Host: The machine running the Docker daemon and containers.
- Plugins: Can be used for extended capabilities (e.g., logging, monitoring).

## 6. Using the Model API
#### Once the container is running, you can send a request to the API to get predictions:

```
curl -X POST -H "Content-Type: application/json" -d '{"input": [1.0, 2.0, 3.0, 4.0]}' http://localhost:5000/predict
```

## 7. Best Practices and Consideration

- Security: Keep the Docker image small and only include necessary dependencies. Use a minimal base image like python:3.8-slim.
- Environment Variables: Use environment variables for configuration and sensitive information.
- Testing: Test the Docker container locally before deploying to production.
- Automation: Use CI/CD pipelines for automated testing and deployment.
- Monitoring: Use Docker plugins or external tools for monitoring container health and performance.

## 8. Scalable app

## Controlling containers using Python API Engine Docker

https://docker-py.readthedocs.io/en/stable/

In [None]:

import docker
client = docker.from_env()
containers = client.containers.list(all=True)
for container in containers:
    print(f"ID: {container.id}, Name: {container.name}, Status: {container.status}")

## Build scalable app

- Backend framework: pip install fastapi uvicorn
- Authorization Web Token: pip install python-jose[cryptography] passlib[bcrypt]
- Databases: pip install sqlalchemy databases pydantic
- Database Migration: pip install alembic
- Databases (PostgreSQL for scalability): pip install asyncpg psycopg2
- GraphQL: pip install graphene
- pip install httpx
- Redis: pip install aioredis
- Environment Variable: pip install python-dotenv

- Note: Graphene is a Python library used to build GraphQL APIs in Python applications. GraphQL is a query language for APIs that allows clients to request exactly the data they need, and nothing more. Unlike REST, which exposes endpoints that return predefined responses, GraphQL gives clients the flexibility to shape the response structure, making it more efficient for querying complex datasets.

### Folder structure basic startup example
APP/
│
├── alembic/               # Alembic for database migrations (already there)
│   ├── versions/          # Migration versions (generated by Alembic)
│   ├── env.py             # Alembic environment configuration
│   ├── script.py.mako     # Alembic script template
│   └── README             # Info for Alembic setup
│
├── app/                   # FastAPI application code
│   ├── api/               # REST API routes
│   │   ├── v1/            # Versioning for API routes
│   │   │   ├── auth.py    # Authentication route (JWT)
│   │   │   └── users.py   # Users route (CRUD)
│   ├── core/              # Core utilities and settings
│   │   └── config.py      # App configuration and settings
│   ├── db/                # Database logic
│   │   ├── models.py      # SQLAlchemy models
│   │   └── database.py    # Database connection logic
│   └── main.py            # Entry point of FastAPI app
│
├── alembic.ini            # Alembic configuration file (already there)
├── Dockerfile             # Dockerfile to containerize the application
├── docker-compose.yml     # Docker Compose file for the app, db, and redis
└── requirements.txt       # Python dependencies

- In directory app:
    - Add .gitignore
        - .env
        - venv
    - (Downloaded Git Bash)
    - Inside the repo:
        - git init . # Initialize version control
        - git remote add origin "repo" # Linked
        - git add .
        - git commit -m "message"
        - git push -f origin main
        - git branch staging
        - git checkout staging
        - git rebase main
        - git push -f origin staging
        

