# Welcome to your first assignment!

Instructions are below. Please give this a try, and look in the solutions folder if you get stuck (or feel free to ask me!)

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">Just before we get to the assignment --</h2>
            <span style="color:#f71;">I thought I'd take a second to point you at this page of useful resources for the course. This includes links to all the slides.<br/>
            <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">https://edwarddonner.com/2024/11/13/llm-engineering-resources/</a><br/>
            Please keep this bookmarked, and I'll continue to add more useful links there over time.
            </span>
        </td>
    </tr>
</table>

# HOMEWORK EXERCISE ASSIGNMENT

Upgrade the day 1 project to summarize a webpage to use an Open Source model running locally via Ollama rather than OpenAI

You'll be able to use this technique for all subsequent projects if you'd prefer not to use paid APIs.

**Benefits:**
1. No API charges - open-source
2. Data doesn't leave your box

**Disadvantages:**
1. Significantly less power than Frontier Model

## Recap on installation of Ollama

Simply visit [ollama.com](https://ollama.com) and install!

Once complete, the ollama server should already be running locally.  
If you visit:  
[http://localhost:11434/](http://localhost:11434/)

You should see the message `Ollama is running`.  

If not, bring up a new Terminal (Mac) or Powershell (Windows) and enter `ollama serve`  
And in another Terminal (Mac) or Powershell (Windows), enter `ollama pull llama3.2`  
Then try [http://localhost:11434/](http://localhost:11434/) again.

If Ollama is slow on your machine, try using `llama3.2:1b` as an alternative. Run `ollama pull llama3.2:1b` from a Terminal or Powershell, and change the code below from `MODEL = "llama3.2"` to `MODEL = "llama3.2:1b"`

In [1]:
# imports

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI
import ollama
print('Loaded successfully !!!!!!!!')

Loaded successfully !!!!!!!!


In [2]:
# Constants

OLLAMA_API = "http://localhost:11434/api/chat"
HEADERS = {"Content-Type": "application/json"}
MODEL = "llama3.2"

In [3]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [4]:
ed = Website("https://edwarddonner.com")
print(ed.get_contents())

Webpage Title:
Home - Edward Donner
Webpage Contents:
Home
Outsmart
An arena that pits LLMs against each other in a battle of diplomacy and deviousness
About
Posts
Well, hi there.
I’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (
very
amateur) and losing myself in
Hacker News
, nodding my head sagely to things I only half understand.
I’m the co-founder and CTO of
Nebula.io
. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously the founder and CEO of AI startup untapt,
acquired in 2021
.
We work with groundbreaking, proprietary LLMs verticalized for talent, we’ve
patented
our matching model, and our award-winning platform has happy customers and tons of pr

In [5]:
# all links found on the page
print("Links on the webpage:")
print(ed.links)

Links on the webpage:
['https://edwarddonner.com/', 'https://edwarddonner.com/outsmart/', 'https://edwarddonner.com/about-me-and-about-nebula/', 'https://edwarddonner.com/posts/', 'https://edwarddonner.com/', 'https://news.ycombinator.com', 'https://nebula.io/?utm_source=ed&utm_medium=referral', 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html', 'https://patents.google.com/patent/US20210049536A1/', 'https://www.linkedin.com/in/eddonner/', 'https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/', 'https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/', 'https://edwarddonner.com/2024/11/13/llm-engineering-resources/', 'https://edwarddonner.com/2024/11/13/llm-engineering-resources/', 'https://edwarddonner.com/2024/10/16/from-software-engineer-to-ai-data-scientist-resources/', 'https://edwarddonner.com/2024/10/16/from-software-engineer-to-ai-data-scientist-resources/', 'https://edwarddonne

In [6]:
eclectics = Website("https://eclectics.io/")
print(eclectics.get_contents())

Webpage Title:
Eclectics International – Simplifying Lives Digitally
Webpage Contents:
Have any questions?
+254 709 646 000
talktous@eclectics.io
Support Portal
Home
About Us
About Us
Our Team
Awards and Certifications
Clients
Solutions
Solutions
eConnect ESB Gateway
Agency Banking
Internet Banking
Mobile Banking
Tijara
Advancys
SME Automation
Remittances
Supply Chain Automation
FMCG
Information Security
Industries
Insurance
SACCOs
Public Sector
Agritech
Enterprise Business Suite
Microsoft
Power-BI
Microsoft 365 Business
Microsoft Azure
Microsoft Dynamic CRM
Microsoft Dynamics ERP
Microsoft Licensing
Microsoft Teams
Enterprise Business Unit
Case Studies
Contact Us
🇰🇪
Ke
🇺🇬
UG
🇹🇿
TZ
🇳🇬
NG
En
En
De
Es
Contact
Eclectics
Home
About Us
About Us
Our Team
Awards and Certifications
Clients
Solutions
Solutions
eConnect ESB Gateway
Agency Banking
Internet Banking
Mobile Banking
Tijara
Advancys
SME Automation
Remittances
Supply Chain Automation
FMCG
Information Security
Industries
Insurance
SACCO

In [7]:
# # all links found on the page
# print("Links on the webpage:")
# print(eclectics.links)

## First step: using ollama figure out which links are relevant

In [8]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [9]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [10]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} -"
    user_prompt += "\nPlease decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
     Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [11]:
print(get_links_user_prompt(eclectics))

Here is the list of links on the website of https://eclectics.io/ -
Please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format.      Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
tel:00%20(123)%20456%2078%2090
mailto:talktous@eclectic.io
https://cxsupport.ekenya.co.ke/crmportal/backend/web/site/login
https://www.facebook.com/share/1AqsAcSZee/?mibextid=LQQJ4d
https://x.com/eclecticsio?s=21&t=kQ5UMyfIOttnBc-O9ZiOnQ
https://www.linkedin.com/company/eclecticsio/
https://m.youtube.com/channel/UCVb9gwiOP842lDdyz68cKVA
https://eclectics.io/
https://eclectics.io/
#
https://eclectics.io/about/
https://eclectics.io/our-team/
https://eclectics.io/awards-and-certifications/
https://eclectics.io/clients/
#
#
https://eclectics.io/esb/
https://eclectics.io/agency-banking/
https://eclectics.io/internet-banking/
https://eclectics.io/mobile-banking/
https://eclectics.io/tijara/
https:

In [12]:
def get_links(url):
    website = Website(url)
    messages = [
        {"role": "system", "content": link_system_prompt},
        {"role": "user", "content": get_links_user_prompt(website)}
    ]
    response = ollama.chat(
        model=MODEL,
        messages=messages
    )

    # the response content
    result = response.message.content

    # Handle empty or invalid JSON responses
    if not result.strip():
        raise ValueError("Response content is empty or invalid.")

    # Attempt to parse as JSON
    try:
        return json.loads(result)
    except json.JSONDecodeError:
        # print("Response is not JSON. Returning an error in dictionary format.")
        return result


In [13]:
# get_links("https://eclectics.io/")

In [14]:
# get_links("https://edwarddonner.com")

## Second step: make the brochure!

In [15]:
def get_all_details(url):
    result = "Landing page:\n"

    # Fetch and add the content of the landing page
    try:
        result += Website(url).get_contents()
    except Exception as e:
        print(f"Error fetching landing page content: {e}")
        result += f"\n[Error fetching landing page content: {e}]\n"

    # Get links from the website
    links = get_links(url)

    # Check if an error occurred in get_links
    if "error" in links:
        print(f"Error from get_links: {links['error']}")
        result += f"\n[Error fetching links: {links.get('raw_content', 'No additional details available.')}\n]"
        return result

    # Ensure the "links" key exists and is a list
    if "links" not in links or not isinstance(links["links"], list):
        raise ValueError("get_links returned an invalid format. Expected a dictionary with a 'links' key containing a list.")

    # Fetch and add the contents of each link
    for link in links["links"]:
        try:
            result += f"\n\n{link['type']}\n"
            result += Website(link["url"]).get_contents()
        except Exception as e:
            print(f"Error fetching content for link {link['url']}: {e}")
            result += f"\n[Error fetching content for {link['url']}: {e}]\n"

    return result

In [16]:
# print(get_all_details("https://eclectics.io/"))

In [17]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."


In [18]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [19]:
# get_brochure_user_prompt("Eclectics", "https://eclectics.io/")

In [21]:
# brochure = create_brochure("Hungging Face", "https://huggingface.co")
# brochure

## Streaming 

In [22]:
def stream_brochure(company_name, url):
    stream = ollama.chat(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [23]:
# stream_brochure("Eclectics", "https://eclectics.io/")

In [24]:

# stream_brochure("HuggingFace", "https://huggingface.co")

In [25]:
def create_brochure(company_name, url):
    """
    Creates a brochure based on the provided company name and URL.
    """
    response = ollama.chat(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)},
        ],
    )
    result = response.message.content
    display(Markdown(result))  # Display the brochure in Markdown
    return result  # Return the content for further use


def translate_brochure_to_french(brochure_text):
    """
    Translates a given brochure text to French using the Ollama model.
    """
    messages = [
        {"role": "system", "content": "You are a translator skilled in French."},
        {
            "role": "user",
            "content": f"Translate the following text into French:\n\n{brochure_text}",
        },
    ]
    
    # Send the request for translation
    response = ollama.chat(model=MODEL, messages=messages)
    
    # Debugging output (optional)
    # print("Raw response from translation:", response)
    
    # Ensure the response contains the translated text
    translated_text = response.message.content if response.message else "No translation available."
    display(Markdown(translated_text))  # Display the translated text in Markdown
    return translated_text

In [27]:

# Create and Translate the Brochure
brochure_text = create_brochure("Hugging Face", "https://huggingface.co")
print('='*100)
translated_brochure = translate_brochure_to_french(brochure_text)


# Hugging Face Brochure
==========================

[Image description: A screenshot of the Hugging Face logo]

**Welcome to Hugging Face**

Hugging Face is a leader in the field of artificial intelligence (AI) and natural language processing (NLP). Our mission is to make it easier for developers to build, train, and deploy AI models. With our flagship product, Hugging Face Transformers, we enable developers to tap into the power of large language models like BERT, RoBERTa, and XLNet.

**Our Story**

Hugging Face was founded in 2016 by a group of researchers who were passionate about making AI more accessible. We started with a simple idea: to create a platform that would allow developers to easily build and deploy AI models using pre-trained language models. Today, we have grown into one of the largest open-source AI platforms in the world.

**Our Technology**

Hugging Face Transformers is a state-of-the-art library for natural language processing tasks such as text classification, sentiment analysis, and machine translation. Our library provides over 100 pre-trained models, allowing developers to leverage the power of large language models without having to train their own models from scratch.

### Key Features

* **Pre-Trained Models**: Access to a vast library of pre-trained models for various NLP tasks
* **Easy Deployment**: Deploy models with just one line of code using our API
* **Community-Driven**: Our community-driven approach ensures that the best models are always available
* **Scalability**: Handle large volumes of data with ease

**Our Impact**

Hugging Face has had a significant impact on the field of NLP and beyond. We have:

* **Enabled faster AI adoption**: By making it easier for developers to build and deploy AI models, we have accelerated the adoption of AI in industries such as healthcare, finance, and education
* **Advanced NLP research**: Our platform has enabled researchers to experiment with new NLP techniques and push the boundaries of what is possible

**Our Team**

Our team consists of experienced engineers, researchers, and developers who are passionate about making AI more accessible. We have offices in multiple locations around the world and are committed to creating a positive impact through our work.

### Join Our Mission

If you're interested in joining our mission to make AI more accessible, we encourage you to:

* **Explore our library**: Check out our pre-trained models and start building your own NLP projects
* **Join our community**: Participate in our forums and discussions with other developers and researchers
* **Stay up-to-date**: Follow us on social media for the latest news and updates

[Image description: A call-to-action button to visit Hugging Face's website]



Voici le texte traduit en français :

# Hugging Face Brochure
==========================

[Description de l'image : Une capture d'écran du logo Hugging Face]

**Bienvenue chez Hugging Face**

Hugging Face est un leader dans le domaine de l'intelligence artificielle (IA) et du traitement de langage naturel (TNL). Notre mission est de rendre plus facile aux développeurs de construire, entraîner et déployer des modèles d'IA. Avec notre produit flagship, Hugging Face Transformers, nous permettons aux développeurs de profiter du pouvoir des grands modèles de langage tels que BERT, RoBERTa et XLNet.

**Notre Histoire**

Hugging Face a été fondé en 2016 par un groupe de chercheurs passionnés pour rendre l'IA plus accessible. Nous avons commencé avec une idée simple : créer une plateforme qui permettrait aux développeurs de construire et déployer facilement des modèles d'IA à l'aide de modèles de langage pré-entraînés. Aujourd'hui, nous avons grandi pour devenir l'un des plus grands plateformes d'IA open-source au monde.

**Notre Technologie**

Hugging Face Transformers est une bibliothèque avancée pour les tâches de traitement du langage naturel telles que la classification de texte, l'analyse du sentiment et la traduction automatique. Notre bibliothèque offre plus de 100 modèles pré-entraînés, permettant aux développeurs d'en profiter sans avoir à entraîner leurs propres modèles à partir de zéro.

### Caractéristiques clés

* **Modèles pré-entraînés** : Accès à une grande bibliothèque de modèles pré-entraînés pour diverses tâches du TNL
* **Déploiement facile** : Déployer des modèles avec juste une ligne de code en utilisant notre API
* **Approche communautaire** : Notre approche communautaire assure que les meilleurs modèles sont toujours disponibles
* **Échelle** : Gérer des volumes importants de données avec facilité

**Notre Impact**

Hugging Face a eu un impact significatif sur le domaine du TNL et au-delà. Nous avons :

* **Accélééré l'adoption de l'IA** : En rendant plus facile aux développeurs de construire et déployer des modèles d'IA, nous avons accéléré l'adoption de l'IA dans les secteurs tels que la santé, les finances et l'éducation
* **Avancé la recherche en TNL** : Notre plateforme a permis aux chercheurs de expérimenter de nouvelles techniques de traitement du langage naturel et de pousser les limites de ce qui est possible

**Notre Équipe**

Notre équipe se compose d'ingénieurs, de chercheurs et de développeurs expérimentés passionnés par la rendre l'IA plus accessible. Nous avons des bureaux dans plusieurs endroits du monde et nous sommes engagés à créer un impact positif à travers notre travail.

### Rejoignez Notre Mission

Si vous êtes intéressé à rejoindre notre mission pour rendre l'IA plus accessible, nous vous encourageons à :

* **Explorer notre bibliothèque** : Vérifiez nos modèles pré-entraînés et commencez à construire vos propres projets de TNL
* **Rejoindre notre communauté** : Participez aux forums et discussions avec d'autres développeurs et chercheurs
* **Restez à l'affût** : Suivez-nous sur les réseaux sociaux pour les dernières actualités et mises à jour

[Description de l'image : Un bouton de appel à l'action pour visiter le site Web de Hugging Face]

In [30]:
def create_brochure_language(company_name, url, language):
    language_prompt = f"You are a professional translator and writer specializing in creating and translating brochures. Convert the brochure to {language} while maintaining its original tone, format, and purpose."
    user_language_prompt = f"Generate a brochure for the company '{company_name}' available at the URL: {url}, and translate it into {language}."
    response = ollama.chat(
        model=MODEL,
        messages=[
            {"role": "system", "content": language_prompt},
            {"role": "user", "content": user_language_prompt}
        ],
    )
    result = response.message.content
    display(Markdown(result))


In [33]:
create_brochure_language("Eclectics", "https://eclectics.io/","French")

**English Version of the Brochure**

[Cover Image: A collage of various art pieces and creative elements]

Welcome to Eclectics

Where Art Meets Innovation

At Eclectics, we're passionate about bringing together diverse art styles, creative techniques, and innovative technologies to produce unique and captivating designs. Our team of skilled artists, designers, and technologists collaborate to push the boundaries of what's possible in visual expression.

**Our Services**

* Graphic Design: From logo creation to branding and marketing materials
* Digital Art: 3D modeling, animation, and visual effects
* Print Design: Brochures, flyers, posters, and more
* Web Design: Responsive websites and e-commerce solutions

**Why Choose Eclectics?**

* Personalized Approach: We take the time to understand your vision and goals
* Expertise: Our team has years of experience in various design fields
* Creativity: We're not afraid to think outside the box
* Quality: We use only the latest software and technologies

**Meet Our Team**

[Image: A photo of the Eclectics team]

* John Doe, Creative Director: With over 10 years of experience in graphic design and digital art.
* Jane Smith, Art Director: Bringing a keen eye for detail and a passion for innovative techniques.
* Bob Johnson, Technical Lead: Expertise in web development and e-commerce solutions.

**Get in Touch**

[Contact Information: Phone number, email address, and physical location]

Contact us to discuss your design needs and see how we can help you achieve your creative vision. We look forward to collaborating with you!

**French Version of the Brochure**

[Cover Image: A collage of various art pieces and creative elements]

Bienvenue chez Eclectics

Où l'Art rencontre la Innovation

À Eclectics, nous sommes passionnés par le mélange de styles artistiques diversifiés, de techniques créatives et d'innovations technologiques pour produire des designs uniques et captivants. Notre équipe d'artistes, de designers et de techniciens expérimentés s'accorde pour pousser les limites du what's possible dans l'expression visuelle.

**Nos Services**

* Conception Graphique : Création de logos, branding et matériel publicitaire
* Art numérique : Modélisation 3D, animation et effets visuels
* Conception Imprimée : Brochures, affiches, posters et plus
* Conception Web : Sites web réactifs et solutions e-commerce

**Pourquoi choisir Eclectics ?**

* Approche personnalisée : Nous prenons le temps de comprendre votre vision et vos objectifs
* Expertise : Notre équipe dispose d'années d'expérience dans différents domaines de la conception graphique
* Créativité : Nous n'avons pas peur de penser à l'extérieur du boîtier
* Qualité : Nous utilisons uniquement les dernières logiciels et technologies

**Rencontrez notre Équipe**

[Image: Une photo de l'équipe d'Eclectics]

* Jean-Doe, Directeur Créatif : Avec plus de 10 ans d'expérience dans la conception graphique et l'art numérique.
* Jane Smith, Responsable Artistique : Un œil aiguisé pour les détails et une passion pour les techniques innovantes.
* Bob Johnson, Chef Technique : Expertise dans le développement web et les solutions e-commerce.

**Contactez-nous**

[Informations de Contact : Numéro de téléphone, adresse email et emplacement physique]

Contactez-nous pour discuter de vos besoins de conception et voyez comment nous pouvons vous aider à atteindre votre vision créative. Nous sommes impatients de collaborer avec vous !

Note: I made some minor adjustments to the translation to maintain the original tone and format of the brochure, while ensuring that it sounds natural in French.