# Ollama Webpage Broshure
 Before executing the code, ensure that the initial setup outlined in the Readme.md file has been completed.

In [20]:
# pip installs

!pip install -q datasets requests torch peft bitsandbytes transformers trl accelerate sentencepiece tiktoken matplotlib

In [21]:
# Import
import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
import os
import requests
import json
from typing import List
import re

In [22]:
# Constants

OLLAMA_API = "http://localhost:11434/api/chat"   # 11434 port where ollama runs
HEADERS = {"Content-Type": "application/json"}
MODEL = "llama3.2"  #Model type

### Test: Verifying Connection with Ollama

In [23]:
# Create a messages list. Has the same format  used on OpenAI

messages = [
    {"role": "user", "content": "Describe some of the business applications of Generative AI"}
]

payload = {
        "model": MODEL,
        "messages": messages,
        "stream": False
    }



There is 3 ways of running ollama package:

- Direct HTTP call: you can send HTTP POST requests to the Ollama API
- ollama python package: provides a higher-level interface to interact with the model, abstracting the direct HTTP requests
- Using OpenAI python library to connect to Ollama

This are making web requests locally from my box to my box. And it's connecting to the llama 3.2 model that's being served by llama, and running at localhost:11434


<u> Recommendation:</u>
- If you are primarily using Ollama: Go with ollama Python Package. It’s the cleanest and most idiomatic solution specifically designed for Ollama.
- If you are using both OpenAI and Ollama APIs: Use OpenAI Library (OpenAI) with custom configuration. This provides a unified approach to handling both APIs.
- For advanced or custom needs: Use Direct HTTP Calls (requests.post) for full control and flexibility, especially if you need to customize headers, handle retries, or debug raw API responses.

In [24]:
response = requests.post(OLLAMA_API, json=payload, headers=HEADERS)
print(response.json()['message']['content'])

Generative AI has numerous business applications across various industries, including:

1. **Content Generation**: AI-powered content generation can automate tasks such as writing articles, social media posts, and product descriptions, freeing up human resources for more strategic work.
2. **Image and Video Creation**: Generative AI can create high-quality images and videos for marketing materials, product visualization, and other visual communications.
3. **Chatbots and Virtual Assistants**: AI-powered chatbots can provide 24/7 customer support, helping businesses respond to customer inquiries and reduce the workload on human agents.
4. **Predictive Analytics**: Generative AI models can analyze vast amounts of data to predict future trends, patterns, and outcomes, enabling businesses to make more informed decisions.
5. **Marketing Automation**: AI-powered marketing automation can help personalize customer experiences, optimize ad campaigns, and streamline lead generation processes.
6.

In [25]:
import ollama

response = ollama.chat(model=MODEL, messages=messages)
print(response['message']['content'])

Generative AI has numerous business applications across various industries, including:

1. **Content Generation**: AI-powered tools can generate high-quality content such as articles, blog posts, social media posts, and even entire books. This helps businesses save time and resources while maintaining consistency in their content.
2. **Product Design and Development**: Generative AI can assist designers and engineers in creating new products by generating 3D models, prototypes, and even entire product lines. This accelerates the design process and reduces costs.
3. **Marketing and Advertising**: AI-generated content, such as social media posts, product descriptions, and ad copy, can help businesses personalize their marketing efforts and increase engagement with customers.
4. **Customer Service**: Generative AI-powered chatbots can provide 24/7 customer support, responding to common queries and routing complex issues to human agents.
5. **Data Analysis and Insights**: Generative AI alg

In [26]:
from openai import OpenAI
ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

response = ollama_via_openai.chat.completions.create(
    model=MODEL,
    messages=messages
)

print(response.choices[0].message.content)

Generative AI, also known as Generative Models or Generative Adversarial Networks (GANs), has numerous business applications across various industries. Here are some examples:

1. **Content Generation**: Generative AI can create high-quality content such as images, videos, text, and music. This can be used for:
	* Automated content creation: generating news articles, social media posts, or product descriptions.
	* Image editing and enhancement: automating photo editing tasks to reduce costs.
	* Music composition: creating jingles or background music for ads or videos.
2. **Data Analysis and Visualization**: Generative AI can analyze large datasets and generate insights, reports, and visualizations:
	* Predictive analytics: generating forecasts, trends, and patterns from historical data.
	* Data enrichment: enhancing data with new features, entities, or relationships.
3. **Marketing Automation**: Generative AI can automate marketing tasks, such as:
	* Personalized advertising: generatin

### Project

#### First Step: Figure out which links are relevant
It would read the links on a webpage, and respond in structured JSON.  It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  

We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.This is an excellent use case for an LLM, because it requires nuanced understanding. Imagine trying to code this without LLMs by parsing and analyzing the webpage - it would be very hard!

In [27]:
# Web Scraping 


class Website: # A class to represent a Webpage, analyize the webpage and extract the information

    def __init__(self, url): # Create this Website object from the given url using the BeautifulSoup library
        
        self.url = url

        try: #Handling errors
            response = requests.get(url)
            response.raise_for_status()  # Raises HTTPError if the response code is not 200
            self.body = response.content
            soup = BeautifulSoup(self.body, 'html.parser')
            self.title = soup.title.string if soup.title else "No title found"
            if soup.body:  
                for irrelevant in soup.body(["script", "style", "img", "input"]): #Clean some unseful info
                    irrelevant.decompose()
                self.text = soup.body.get_text(separator="\n", strip=True)
            else: 
                self.text = ""
            
            links = [link.get('href') for link in soup.find_all('a')]  # gather any links that are referred to on this page
            self.links = [link for link in links if link]

        except requests.exceptions.RequestException as e:
            print(f"Error fetching the URL: {e}")
            return None
        
    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"


In [28]:

#Website Class Testing

try:
    law = Website("https://www.law.com/?slreturn=20250125165349")
    print(law.title)
    print(law.links)

except Exception as e:
    print(f"Error initializing Website object: {e}")



Law.com | The Premier Source for Global Legal News & Analysis
['https://store.law.com/Registration/Newsletters.aspx?promoCode=PAR&source=https://www.law.com/', '/events/', 'https://store.law.com/Registration/Login.aspx?promoCode=PAR&source=https://www.law.com/', '/', '/mylaw', '/pro/', 'https://store.law.com/Registration/Login.aspx?promoCode=PAR&source=https://www.law.com/', '/', '/', '/static/about-us/', '/events/', 'https://twitter.com/lawdotcom', 'https://www.linkedin.com/company/law-com/', 'https://www.facebook.com/LawdotcomALM/', 'http://feeds.feedblitz.com/law/legal-news/', 'https://www.law.com/pro/', 'https://www.law.com/pro-mid-market/', 'https://www.law.com/global-leaders-in-law/', 'https://www.law.com/global-leaders-in-law-advisers/', 'https://www.law.com/private-client-global-elite/', 'https://www.law.com/', 'https://www.law.com/radar/', 'https://www.law.com/americanlawyer/', 'https://www.law.com/corpcounsel/', 'https://www.law.com/nationallawjournal/', 'https://www.law.com/

In [29]:
#Prompt Setup for obtaining the links:

"""We need to create a structure the APIs understand:
#[ {"role": "system", "content": "system message goes here"},
#  {"role": "user", "content": "user message goes here"} ] """


#One shot prompting
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""


def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt


def messages_for(website):
    return [
        {"role": "system", "content": link_system_prompt},
        {"role": "user", "content": get_links_user_prompt(website)}
    ]


In [30]:
# Run Function that unifies the web scrapping to obtain the info with the model that obtain the diferent links.
import ollama

def get_links(url):
    website = Website(url)  # Web scraping 
    try:
        response = ollama.chat(model=MODEL, messages=messages_for(website))  # LLM inference for retrieving the links
        print(response) 

        #Since the model sometimes add more data that just the json information with the links, we need to extract just that
        # Use regex to extract the JSON string from the response
        json_pattern = r'(\{.*\})'  # Pattern to match the JSON data  anything that starts with { and ends with }
        match = re.search(json_pattern, response['message']['content'], re.DOTALL) #re.DOTALL allows the dot (.) in the regular expression to match newlines as well, meaning the regex can match content that spans multiple lines.
        if match:
            json_str = match.group(1)  # Extract the matched JSON string
            links = json.loads(json_str)  # Parse the JSON data
            return links  # Return only the links
        else:
            print("Error: No JSON found in the response")
            return None
    except Exception as e:
        print(f"Error during LLM API call: {e}")
        return None

In [31]:
#Test Get links funtion

links=get_links("https://huggingface.co")
print(links)
#print(type(links))

model='llama3.2' created_at='2025-01-26T05:35:03.095612Z' done=True done_reason='stop' total_duration=10466875250 load_duration=17781292 prompt_eval_count=702 prompt_eval_duration=1936000000 eval_count=255 eval_duration=8501000000 message=Message(role='assistant', content='Here are the relevant links for a brochure about the company, Hugging Face:\n\n```\n{\n  "links": [\n    {"type": "About page", "url": "https://huggingface.co/"},\n    {"type": "Company page", "url": "https://huggingface.co/about"},\n    {"type": "Enterprise page", "url": "/enterprise"},\n    {"type": "Pricing page", "url": "/pricing"},\n    {"type": "Jobs/Careers page", "url": "https://apply.workable.com/huggingface/"},\n    {"type": "Blog", "url": "docs/blog"}\n  ]\n}\n```\n\nNote that I\'ve excluded the following links as they are not directly related to company information:\n\n* Relative links (e.g. /models, /datasets, etc.)\n* Login and join links\n* Email links\n* Terms of Service and Privacy pages\n* Social me

#### Second step: make the brochure!

In [32]:
# Execute a function that integrates web scraping to gather general information from the landing page, utilizes a model to extract various links, and verifies the 
# accessibility of each link generated by the model. If a link is accessible, the function retrieves its content; if not, the link is skipped. 
# This ensures that all necessary information for the subsequent model is readily available.

def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print(links)
    for link in links["links"]:
        print(link)
        result += f"\n\n{link['type']}\n"
        try:
            response = requests.get(link["url"]) # Attempt to fetch the content of the link
            if response.status_code == 200: 
                result += Website(link["url"]).get_contents()
            else:
                # Print an error and skip this link
                print(f"Skipping broken link: {link['url']} - Status code: {response.status_code}")
                
        except requests.exceptions.RequestException as e:
            # Catch any exception (e.g., network issues, DNS errors, etc.)
            print(f"Skipping broken link: {link['url']} - Error: {e}")
            
    return result

print(get_all_details("https://huggingface.co"))


model='llama3.2' created_at='2025-01-26T05:35:09.252142Z' done=True done_reason='stop' total_duration=4610592458 load_duration=12546583 prompt_eval_count=702 prompt_eval_duration=92000000 eval_count=155 eval_duration=4503000000 message=Message(role='assistant', content='{\n    "links": [\n        {"type": "Company page", "url": "https://huggingface.co"},\n        {"type": "About us", "url": "https://discuss.huggingface.co"},\n        {"type": "Blog", "url": "https://blog.huggingface.co"},\n        {"type": "GitHub repository", "url": "https://github.com/huggingface"},\n        {"type": "Twitter handle", "url": "https://twitter.com/huggingface"},\n        {"type": "LinkedIn page", "url": "https://www.linkedin.com/company/huggingface/"},\n        {"type": "Discord community", "url": "https://join.huggingface.co/discord"}\n    ]\n}', images=None, tool_calls=None)
{'links': [{'type': 'Company page', 'url': 'https://huggingface.co'}, {'type': 'About us', 'url': 'https://discuss.huggingface.

In [33]:
# Prompt Setup for creating the broshure, the user prompt run previous funtion so it obtaind the data of each link

system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."


def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

get_brochure_user_prompt("HuggingFace", "https://huggingface.co")


model='llama3.2' created_at='2025-01-26T05:35:20.708013Z' done=True done_reason='stop' total_duration=5864479375 load_duration=17189333 prompt_eval_count=702 prompt_eval_duration=103000000 eval_count=178 eval_duration=5741000000 message=Message(role='assistant', content='Here is the list of relevant links in JSON format:\n\n{\n    "links": [\n        {\n            "type": "About page",\n            "url": "https://huggingface.co"\n        },\n        {\n            "type": "Company page",\n            "url": "https://huggingface.co/brand"\n        },\n        {\n            "type": "Careers/Jobs page",\n            "url": "https://apply.workable.com/huggingface/"\n        },\n        {\n            "type": "GitHub repository",\n            "url": "https://github.com/huggingface"\n        },\n        {\n            "type": "Twitter handle",\n            "url": "https://twitter.com/huggingface"\n        },\n        {\n            "type": "LinkedIn page",\n            "url": "https://www

'You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nPosts\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nTrending on\nthis week\nModels\ndeepseek-ai/DeepSeek-R1\nUpdated\n3 days ago\n•\n109k\n•\n2.62k\ndeepseek-ai/DeepSeek-R1-Distill-Qwen-32B\nUpdated\n3 days ago\n•\n86.1k\n•\n484\nhexgrad/Kokoro-82M\nUpdated\n1 day ago\n•\n35.2k\n•\n2.39k\ndeepseek-ai/DeepSeek-R1-Zero\nUpdated\n3 days ago\n•\n5.42k\n•\n434\ndeepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B\nUpdated\n3 days ago\n•\n94k\n•\n381\nBrowse 400k+ models\nSpaces\nRunning\non\nZero\n1.54k\n❤️\nKokoro TTS\nNow in 5 l

In [None]:
# Funtions that create the broshure
def create_brochure(company_name, url):

    response = ollama.chat(model=MODEL, messages=[  # LLM inference
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ]) 
    result = response['message']['content']
    display(Markdown(result))


create_brochure("HuggingFace", "https://huggingface.com")

model='llama3.2' created_at='2025-01-26T05:35:36.70609Z' done=True done_reason='stop' total_duration=7072689250 load_duration=13047792 prompt_eval_count=665 prompt_eval_duration=1450000000 eval_count=171 eval_duration=5606000000 message=Message(role='assistant', content='Here is the list of relevant links in JSON format:\n\n{\n    "links": [\n        {"type": "About page", "url": "https://huggingface.com/"},\n        {"type": "Company page", "url": "https://huggingface.co/"},\n        {"type": "Careers/Jobs page", "url": "https://apply.workable.com/huggingface/"},\n        {"type": "Blog", "url": "https://blog.huggingface.co/"},\n        {"type": "Discussions forum", "url": "https://discuss.huggingface.co/"},\n        {"type": "Status page", "url": "https://status.huggingface.co/"},\n        {"type": "GitHub repository", "url": "https://github.com/huggingface"}\n    ]\n}', images=None, tool_calls=None)
{'links': [{'type': 'About page', 'url': 'https://huggingface.com/'}, {'type': 'Comp

# Hugging Face: Building the Future of AI
=============================================

Welcome to Hugging Face, the leading platform for machine learning (ML) innovation and collaboration. Our mission is to empower a global community of ML practitioners to create, discover, and collaborate on cutting-edge models, datasets, and applications.

## About Us
---------------

At Hugging Face, we believe that AI has the potential to transform industries and improve lives. We're dedicated to building the foundation of ML tooling with our community, enabling anyone to work with state-of-the-art models, datasets, and applications.

## Our Mission
-----------------

*   **Collaboration**: Host and collaborate on unlimited public models, datasets, and applications.
*   **Innovation**: Explore all modalities: text, image, video, audio, or even 3D.
*   **Education**: Build your portfolio, share your work with the world, and build your ML profile.

## Community
-------------

Our community is diverse and passionate about AI. With over 50,000 organizations using Hugging Face, you'll find a network of like-minded individuals working together to advance the field.

### Partnerships

We partner with leading companies like:

*   **Meta**: AI at Meta
*   **Amazon Web Services**: Deploy on optimized inference endpoints or update your Spaces applications to a GPU in a few clicks.
*   **Google**: Google: company
*   **Intel**: Intel: company
*   **Microsoft**: Microsoft: company
*   **Grammarly**: Grammarly: company

### Open Source Projects

We're committed to open source. Explore our projects:

*   **Transformers**: 138,018
*   **Diffusers**: 27,273
*   **Safetensors**: 3,019
*   **Tokenizers**: 9,278
*   **PEFT**: 17,067

## Spaces
----------

Our platform allows you to run models locally in-browser or deploy on optimized inference endpoints.

### Featured Models

*   **DeepSeek-R1**: Next-generation reasoning model that runs locally in-browser.
*   **TRELLIS**: Scalable and Versatile 3D Generation from images.

## Get Involved
-----------------

Join our community to collaborate, learn, and grow with the latest AI advancements. Explore our:

### Resources

*   [Documentation](link)
*   [Blog](link)
*   [Forum](link)

### Social Media

*   [Twitter](link)
*   [LinkedIn](link)
*   [Discord](link)