
#### 1. Initial Summarization Using Ollama
 
The journey began with a foundational step: analyzing the content of Kamatech Solutions' website.
Leveraging [Ollama](https://ollama.com/) a free and no paid API that runs locally on your computer, two language models were employed to extract concise and informative summaries of the website's content.
The focus was on capturing the essence of Kamatech solutions' products or services, their value proposition, their target and industry focus, as well as any notable aspects or their approach and methodology.

#### 2. Bilingual Summarization and Translation

Following the initial summarization, the same website content was summarized again, this time with an added layer of complexity: translation into French.
The translated summary maintained the core message while ensuring linguistic and cultural nuances were respected.
This bilingual approach highlighted the adaptability of the LLMs in handling multilingual tasks effectively.
        
#### Stepping Into LLM Engineering

This initial analysis served as a critical starting point for a broader LLM Engineering journey.
Future steps will include building advanced models, customizing them to solve domain-specific problems, refining natural language understanding capabilities, and expanding multilingual features even further.

In [1]:
# Import Libraries

import requests
import subprocess
import json

In [2]:
# Fetch the website content
response = requests.get('https://www.kamatechsolutions.com')
website_content = response.text

# Create a JSON payload for Ollama API
payload = {
    "model": "mistral",
    "prompt": f"Summarize the following website content in 3-8 sentences: {website_content}"
}

try:
    # Call Ollama API (assuming it's running locally on default port)
    ollama_response = requests.post('http://localhost:11434/api/generate', json=payload)
    
    # Check if response is valid
    if ollama_response.status_code == 200:
        # Parse the streaming response
        response_text = ""
        for line in ollama_response.text.strip().split('\n'):
            if line:
                try:
                    response_json = json.loads(line)
                    if 'response' in response_json:
                        response_text += response_json['response']
                except json.JSONDecodeError:
                    continue
        
        summary = response_text if response_text else "No summary generated"
    else:
        summary = f"Error: Received status code {ollama_response.status_code}"
except Exception as e:
    summary = f"Error: {str(e)}"

print("Website Summary:")
print(summary)

Website Summary:
 This HTML code appears to be for a webpage with a blog layout. The page includes CSS, JavaScript, and PHP (presumably used on the server-side).

The page has a loading mechanism that hides all but the first three blog posts and shows "Load More" button at the end of the loaded posts. Clicking this button loads another 20 hidden blog posts, until there are no more hidden posts to load. This is done using AJAX calls to fetch additional data asynchronously.

The page also includes a progress bar that appears when an AJAX call is in progress and disappears once the data has been loaded successfully. Additionally, the code uses various jQuery plugins like LoadMorePushAjax for handling AJAX requests and managing content loading and navigation.


### Interpretation: 

The above summary we received is focused on the website's technical implementation rather than its actual content. This often happens when the the model analyzes the raw HTML, CSS or JavaScript codes instead of the website text content that human visitors would see.

**Here is the approch to improve the summary quality**

In [3]:
# Import Libraries

import requests
import logging
from typing import Optional

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Constants
OLLAMA_API_URL = 'http://localhost:11434/api/generate'  # Default Ollama API endpoint
DEFAULT_MODEL = 'mistral'  # Use llama2 as the default model
DEFAULT_MAX_TOKENS = 200  # Adjust as needed
DEFAULT_TEMPERATURE = 0.5  # Adjust as needed
REQUEST_TIMEOUT = 200  # seconds (Ollama can take longer to respond)

def call_ollama_api(prompt: str, model: str = DEFAULT_MODEL, max_tokens: int = DEFAULT_MAX_TOKENS, temperature: float = DEFAULT_TEMPERATURE) -> Optional[str]:
    try:
        data = {
            'model': model,
            'prompt': prompt,
            'max_tokens': max_tokens,
            'temperature': temperature,
            'stream': False  # Set to False to get a single response
        }

        # Send the request to the Ollama API
        response = requests.post(OLLAMA_API_URL, json=data, timeout=REQUEST_TIMEOUT)
        response.raise_for_status()

        # Extract the response content
        result = response.json()
        return result.get('response', '').strip()

    except requests.exceptions.RequestException as e:
        logger.error(f"Request error calling Ollama API: {e}")
    except KeyError as e:
        logger.error(f"Key error in parsing Ollama API response: {e}")
    except Exception as e:
        logger.error(f"Unexpected error calling Ollama API: {e}")

    return None

# Example usage
if __name__ == "__main__":
    prompt =  f"""Extract the following information from https://kamatechsolutions.com:
        1. Company name and business type
        2. Main products / services
        3. Value proposition and USP
        4. Target audience / industry
        5. Notable methodologies/approaches

        Present as concise bullet points. Only include explicitly stated facts.
        """
    summary = call_ollama_api(prompt)
    if summary:
        logger.info(f"Summary: {summary}")
    else:
        logger.error("Failed to generate summary.")

INFO:__main__:Summary: 1. Company Name and Business Type:
     - Company Name: Kama Tech Solutions
     - Business Type: Software Development Company, IT Consulting Services

  2. Main Products / Services:
   - Custom Software Development
   - Mobile App Development (Android & iOS)
   - Web Development
   - IT Consultancy
   - Data Analysis and Visualization
   - Quality Assurance Testing

  3. Value Proposition and USP:
   - Quick turnaround time for projects without compromising quality
   - Highly skilled and experienced development team
   - Agile methodology and DevOps practices for efficient project delivery
   - Customer satisfaction is their top priority

  4. Target Audience / Industry:
   - Businesses across various industries requiring custom software solutions, app development, IT consulting, and data analysis services

  5. Notable Methodologies/Approaches:
   - Agile methodology for project management
   - DevOps practices for continuous integration and delivery
   - Lean

#### Leveraging BeautiulSoup for webscraping:
- Use beautifulSoup to properly extract content from the HTML.

In [4]:
import requests
from bs4 import BeautifulSoup
import time
import json

def summarize_website(url, model="mistral"):
    """Fetch website content and generate a summary using Ollama API"""
    print(f"Processing {url}...")
    start_time = time.time()
    
    # Fetch and parse website content
    try:
        response = requests.get(url)
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # Remove non-content elements
        for element in soup(['script', 'style', 'meta', 'link', 'noscript', 'iframe']):
            element.decompose()
        
        # Extract visible text (limit to 8000 chars if needed)
        text = soup.get_text(separator=' ', strip=True)
        text = text[:8000] + "..." if len(text) > 8000 else text
        
        print(f"- Content extracted ({len(text)} chars) in {time.time() - start_time:.2f}s")
        
        # Generate summary using Ollama
        prompt = f"""
        Based on this website content, summarize:
        1. Company name and business type
        2. Main products or services offered
        3. Value proposition
        4. Target audience
        5. Notable methodologies

        Present as concise bullet points. Only include explicitly stated facts.
        
        Website content: {text}
        """
        
        ollama_start = time.time()
        
        # Use stream=True to handle streaming response properly
        response = requests.post(
            'http://localhost:11434/api/generate', 
            json={"model": model, "prompt": prompt},
            stream=True
        )
        
        # Extract the summary from streaming response
        summary = ""
        if response.status_code == 200:
            for line in response.iter_lines():
                if line:
                    try:
                        data = json.loads(line)
                        if 'response' in data:
                            summary += data['response']
                    except json.JSONDecodeError:
                        continue
        else:
            summary = f"Error: API returned status code {response.status_code}"
        
        print(f"- Summary generated in {time.time() - ollama_start:.2f}s")
        
        return summary
    
    except Exception as e:
        return f"Error: {str(e)}"

if __name__ == "__main__":
    total_start = time.time()
    summary = summarize_website('https://www.kamatechsolutions.com')
    print(f"\nTotal time: {time.time() - total_start:.2f}s")
    print("\nSUMMARY:")
    print(summary)

Processing https://www.kamatechsolutions.com...
- Content extracted (8003 chars) in 1.02s
- Summary generated in 271.15s

Total time: 272.17s

SUMMARY:
1. Company name: Kamatech Solutions
    2. Main products or services offered: Advanced Analytics consulting services, Data Science, Machine Learning, Artificial Intelligence
    3. Value proposition: Unloads analytics burden, delivers high-value customized solutions, minimal disruption to clients, driven by data-driven insights and cutting-edge technologies
    4. Target audience: Businesses located in the US and around the world
    5. Key services: Data Science, Machine Learning, Big Data Analytics, Business Intelligence (BI), Data Analytics, Business Analytics, Database and System Design, among others.
