
#### 1. Initial Summarization Using Ollama
 
The journey began with a foundational step: analyzing the content of Kamatech Solutions' website.
Leveraging [Ollama](https://ollama.com/) a free and no paid API that runs locally on your computer, two language models were employed to extract concise and informative summaries of the website's content.
The focus was on capturing the essence of Kamatech solutions' products or services, their value proposition, their target and industry focus, as well as any notable aspects or their approach and methodology.

#### 2. Bilingual Summarization and Translation

Following the initial summarization, the same website content was summarized again, this time with an added layer of complexity: translation into French.
The translated summary maintained the core message while ensuring linguistic and cultural nuances were respected.
This bilingual approach highlighted the adaptability of the LLMs in handling multilingual tasks effectively.
        
#### Stepping Into LLM Engineering

This initial analysis served as a critical starting point for a broader LLM Engineering journey.
Future steps will include building advanced models, customizing them to solve domain-specific problems, refining natural language understanding capabilities, and expanding multilingual features even further.

In [1]:
# Import Libraries

import requests
import subprocess
import json
import ollama

In [2]:
# Fetch the website content
response = requests.get('https://www.kamatechsolutions.com')
website_content = response.text

# Create a JSON payload for Ollama API
payload = {
    "model": "mistral",
    "prompt": f"Summarize the following website content in 3-5 sentences: {website_content}"
}

try:
    # Call Ollama API (assuming it's running locally on default port)
    ollama_response = requests.post('http://localhost:11434/api/generate', json=payload)
    
    # Check if response is valid
    if ollama_response.status_code == 200:
        # Parse the streaming response
        response_text = ""
        for line in ollama_response.text.strip().split('\n'):
            if line:
                try:
                    response_json = json.loads(line)
                    if 'response' in response_json:
                        response_text += response_json['response']
                except json.JSONDecodeError:
                    continue
        
        summary = response_text if response_text else "No summary generated"
    else:
        summary = f"Error: Received status code {ollama_response.status_code}"
except Exception as e:
    summary = f"Error: {str(e)}"

print("Website Summary:")
print(summary)

Website Summary:
 This HTML file contains the structure for a webpage displaying a blog. The page loads with three blog posts visible and a "Load More" button at the bottom to load additional posts. The JavaScript code adds a "Load More" button if it does not already exist, hides all but the first three blog posts, sets up an AJAX call to fetch more posts when the "Load More" button is clicked, and adjusts the UI as necessary (e.g., showing/hiding the "Load More" button and loading spinner) during this process. The AJAX function uses the LoadMorePushAjax() function to handle the request and response from the server. The CSS in this HTML file controls the styling of the blog posts, buttons, and loaders.


### Interpretation: 

#### The above summary we received is focused on the website's technical implementation rather than its actual content. This often happens when the the model analyzes the raw HTML, CSS or JavaScript codes instead of the website text content that human visitors would see.

#### Here is the approch to improve the summary quality

In [3]:
import time
from bs4 import BeautifulSoup

def process_streaming_response(response):
    """Extract and combine text from streaming API response"""
    full_response = ""
    for line in response.text.strip().split('\n'):
        if line.strip():
            try:
                response_json = json.loads(line)
                if 'response' in response_json:
                    full_response += response_json['response']
            except json.JSONDecodeError:
                continue
    return full_response

def main():
    start_total = time.time()  # Start timing the total process

    # Step 1: Fetch the website content
    
    print("Fetching website content...")
    fetch_start = time.time()
    response = requests.get('https://www.kamatechsolutions.com')
    website_content = response.text
    fetch_time = time.time() - fetch_start
    print(f"Website fetched in {fetch_time:.2f} seconds")

    # Step 2: Process the HTML content with BeautifulSoup
    
    print("Processing HTML content...")
    process_start = time.time()
    soup = BeautifulSoup(website_content, 'html.parser')
    
    # Remove script, style, and other non-content elements
    for element in soup(['script', 'style', 'meta', 'link', 'noscript', 'iframe']):
        element.decompose()
    
    # Get visible text
    visible_text = soup.get_text(separator=' ', strip=True)
    
    # Limit text length if needed (models have context limits)
    if len(visible_text) > 8000:
        visible_text = visible_text[:8000] + "..."
    
    process_time = time.time() - process_start
    print(f"HTML processed in {process_time:.2f} seconds")
    print(f"Extracted text length: {len(visible_text)} characters")

    # Step 3: Create a specific, detailed prompt
    
    detailed_prompt = f"""
    Based on the provided website content, create a comprehensive summary that includes:
    
    1. The company name and what type of business it is
    2. Their main products or services offered
    3. Their value proposition and unique selling points
    4. Their target audience or industry focus
    5. Any notable aspects of their approach or methodology
    
    Focus only on factual information directly stated on the website. Ignore any HTML or technical elements.
    
    Website content: {visible_text}
    """

    # Step 4: Make the API call to Ollama
    
    print("Generating summary...")
    payload = {
        "model": "mistral",  # Make sure you've pulled this model with 'ollama pull mistral'
        "prompt": detailed_prompt
    }

    try:
        # Call Ollama API and time the processing
        generate_start = time.time()
        ollama_response = requests.post('http://localhost:11434/api/generate', 
                                      json=payload)
        
        # Check if response is valid
        if ollama_response.status_code == 200:
            summary = process_streaming_response(ollama_response)
            if not summary:
                summary = "No summary generated"
        else:
            summary = f"Error: Received status code {ollama_response.status_code}"
            print(f"Raw API response: {ollama_response.text}")
        
        generate_time = time.time() - generate_start
        print(f"Summary generated in {generate_time:.2f} seconds")
        
    except Exception as e:
        summary = f"Error: {str(e)}"
        generate_time = time.time() - generate_start
        print(f"Error occurred after {generate_time:.2f} seconds")

    total_time = time.time() - start_total
    print(f"\nTotal processing time: {total_time:.2f} seconds")

    print("\nWebsite Summary:")
    print(summary)

if __name__ == "__main__":
    main()


Fetching website content...
Website fetched in 1.83 seconds
Processing HTML content...
HTML processed in 0.04 seconds
Extracted text length: 8003 characters
Generating summary...
Summary generated in 315.91 seconds

Total processing time: 317.78 seconds

Website Summary:
1. Company name: Kamatech Solutions
    - Type of business: IT Consulting services company

2. Main products/services offered:
    - Advanced Analytics consulting services
        - Data Science (DS)
        - Machine Learning (ML)
        - Artificial Intelligence (AI)
    - Big Data Analytics
    - Business Intelligence (BI)
    - Data Analytics
    - I Business Analytics
    - Database and System Design
    - Cloud services

3. Additional information:
    - The company offers expert advice and solutions for complex problems in various sectors.
    - They serve clients from multiple countries worldwide, including Afghanistan to Zambia.
    - Clients can reach out through email or phone calls for more information or t

In [4]:
import time
from bs4 import BeautifulSoup

def process_streaming_response(response):
    """Extract and combine text from streaming API response"""
    full_response = ""
    for line in response.text.strip().split('\n'):
        if line.strip():
            try:
                response_json = json.loads(line)
                if 'response' in response_json:
                    full_response += response_json['response']
            except json.JSONDecodeError:
                continue
    return full_response

def main():
    start_total = time.time()  # Start timing the total process

    # Step 1: Fetch the website content
    print("Fetching website content...")
    fetch_start = time.time()
    response = requests.get('https://www.kamatechsolutions.com')
    website_content = response.text
    fetch_time = time.time() - fetch_start
    print(f"Website fetched in {fetch_time:.2f} seconds")

    # Step 2: Process the HTML content with BeautifulSoup
    print("Processing HTML content...")
    process_start = time.time()
    soup = BeautifulSoup(website_content, 'html.parser')
    
    # Remove script, style, and other non-content elements
    for element in soup(['script', 'style', 'meta', 'link', 'noscript', 'iframe']):
        element.decompose()
    
    # Get visible text
    visible_text = soup.get_text(separator=' ', strip=True)
    
    # Limit text length if needed (models have context limits)
    if len(visible_text) > 8000:
        visible_text = visible_text[:8000] + "..."
    
    process_time = time.time() - process_start
    print(f"HTML processed in {process_time:.2f} seconds")
    print(f"Extracted text length: {len(visible_text)} characters")

    # Step 3: Create a specific, detailed prompt
    detailed_prompt = f"""
    Based on the provided website content, create a comprehensive summary that includes:
    
    1. The company name and what type of business it is
    2. Their main products or services offered
    3. Their value proposition and unique selling points
    4. Their target audience or industry focus
    5. Any notable aspects of their approach or methodology
    
    Focus only on factual information directly stated on the website. Ignore any HTML or technical elements.
    
    Website content: {visible_text}
    """

    # Step 4: Make the API call to Ollama
    print("Generating summary...")
    payload = {
        "model": "mistral",  # Make sure you've pulled this model with 'ollama pull llama2'
        "prompt": detailed_prompt
    }

    try:
        # Call Ollama API and time the processing
        generate_start = time.time()
        ollama_response = requests.post('http://localhost:11434/api/generate', 
                                      json=payload)
        
        # Check if response is valid
        if ollama_response.status_code == 200:
            summary = process_streaming_response(ollama_response)
            if not summary:
                summary = "No summary generated"
        else:
            summary = f"Error: Received status code {ollama_response.status_code}"
            print(f"Raw API response: {ollama_response.text}")
        
        generate_time = time.time() - generate_start
        print(f"Summary generated in {generate_time:.2f} seconds")
        
    except Exception as e:
        summary = f"Error: {str(e)}"
        generate_time = time.time() - generate_start
        print(f"Error occurred after {generate_time:.2f} seconds")

    # Step 5: Two-step approach (optional - uncomment if needed)

    # Try a two-step approach if the first summary is unsatisfactory
    
    print("Generating refined summary (two-step approach)...")
    
    # First extract key elements from the website
    extract_payload = {
        "model": "llama2",
        "prompt": f"Extract only the company name, services, products, and main business purpose from this text: {visible_text}"
    }
    
    extract_response = requests.post('http://localhost:11434/api/generate', json=extract_payload)
    extracted_info = process_streaming_response(extract_response)
    
    # Then create a focused summary
    refine_payload = {
        "model": "mistral", 
        "prompt": f"Create a clear, concise summary of this business based on the extracted information: {extracted_info}"
    }
    
    refine_response = requests.post('http://localhost:11434/api/generate', json=refine_payload)
    refined_summary = process_streaming_response(refine_response)
    
    print("\nRefined Summary (two-step approach):")
    print(refined_summary)

    total_time = time.time() - start_total
    print(f"\nTotal processing time: {total_time:.2f} seconds")

    print("\nWebsite Summary:")
    print(summary)

if __name__ == "__main__":
    main()

Fetching website content...
Website fetched in 1.28 seconds
Processing HTML content...
HTML processed in 0.04 seconds
Extracted text length: 8003 characters
Generating summary...
Summary generated in 336.56 seconds
Generating refined summary (two-step approach)...

Refined Summary (two-step approach):
 Business Name: Unnamed

Industry: Renewable Energy Solutions

Location: Global operations with headquarters in San Francisco, USA.

Key Products/Services: The company specializes in providing clean and sustainable energy solutions to residential, commercial, and industrial sectors using solar power systems, wind turbines, and energy storage systems. They also offer consulting services for energy efficiency and renewable energy integration.

Notable Partnerships: Partnered with leading tech companies and government agencies worldwide to develop innovative renewable energy technologies and implement large-scale renewable energy projects.

Achievements: Recognized as a pioneer in the renewa