<img src="https://drive.google.com/uc?export=view&id=1wYSMgJtARFdvTt5g7E20mE4NmwUFUuog" width="200">

[![Gen AI Experiments](https://img.shields.io/badge/Gen%20AI%20Experiments-GenAI%20Bootcamp-blue?style=for-the-badge&logo=artificial-intelligence)](https://github.com/buildfastwithai/gen-ai-experiments)
[![Gen AI Experiments GitHub](https://img.shields.io/github/stars/buildfastwithai/gen-ai-experiments?style=for-the-badge&logo=github&color=gold)](http://github.com/buildfastwithai/gen-ai-experiments)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1fxgp-ioRJnxsSk4XeMeTFuLT4mP8dzzf?usp=sharing)



## Master Generative AI in 8 Weeks
**What You'll Learn:**
- Master cutting-edge AI tools & frameworks
- 6 weeks of hands-on, project-based learning
- Weekly live mentorship sessions
- Join Innovation Community

Learn by building. Get expert mentorship and work on real AI projects.
[Start Your Journey](https://www.buildfastwithai.com/genai-course)

###install Libraries

In [None]:
%%capture
!pip install scrapegraphai --upgrade
!apt install chromium-chromedriver
!pip install nest_asyncio
!pip install playwright
!playwright install

In [None]:
import nest_asyncio
nest_asyncio.apply()

###Setup API Keys

In [None]:
from google.colab import userdata

OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
GEMINI_API_KEY = userdata.get('GEMINI_API_KEY')

# **OpenAI Implementation**

Define the configuration for the graph for OpenAI

In [None]:
graph_config_openai = {
    "llm": {
        "api_key": OPENAI_API_KEY,
        "model": "gpt-3.5-turbo",
        "temperature":0,
    },
    "verbose":True,
}

##**`SmartScraperGraph`**



> **SmartScraperGraph** is a class representing one of the default scraping pipelines. It uses a direct graph implementation where each node has its own function, from retrieving html from a website to extracting relevant information based on your query and generate a coherent answer.




Create the SmartScraperGraph instance and run it

In [None]:
from scrapegraphai.graphs import SmartScraperGraph


smart_scraper_graph = SmartScraperGraph(
    prompt="List me all the apps with their descriptions.",
    # also accepts a string with the already downloaded HTML code
    source="https://buildfastwithai.com/apps",
    config=graph_config_openai
)

result = smart_scraper_graph.run()

--- Executing Fetch Node ---
--- Executing Parse Node ---
--- Executing RAG Node ---
--- (updated chunks metadata) ---
--- (tokens compressed and vector stored) ---
--- Executing GenerateAnswer Node ---


Processing chunks: 100%|██████████| 1/1 [00:00<00:00, 787.81it/s]


In [None]:
import json

print(json.dumps(result,indent=2))

{
  "apps": [
    {
      "name": "Chat with PDF",
      "description": "Chat with your PDFs. Free!!!"
    },
    {
      "name": "Gita GPT",
      "description": "Chat with lord krishna"
    },
    {
      "name": "Youtube Bot",
      "description": "Talk to youtube videos"
    },
    {
      "name": "Character AI",
      "description": "Talk to your favourite characters"
    },
    {
      "name": "GPT Data Analyst",
      "description": "Talk to your data"
    },
    {
      "name": "Interview Bot",
      "description": "Get interview questions answered"
    }
  ]
}


## **`SearchGraph`**


> This graph **transforms** the user prompt in a **internet search query**, fetch the relevant URLs, and start the scraping process. Similar to the **SmartScraperGraph** but with the addition of the **SearchInternetNode** node.



In [None]:
from scrapegraphai.graphs import SearchGraph

search_graph = SearchGraph(
    prompt="List me top 10 LMSYS LLMs.",
    config=graph_config_openai
)

result = search_graph.run()

--- Executing SearchInternet Node ---
Search Query: top 10 LMSYS LLMs
--- Executing GraphIterator Node with batchsize 16 ---


processing graph instances:   0%|          | 0/3 [00:00<?, ?it/s][A[A--- Executing Fetch Node ---
--- Executing Fetch Node ---
--- Executing Fetch Node ---
--- Executing Parse Node ---
--- Executing RAG Node ---
--- (updated chunks metadata) ---
--- (tokens compressed and vector stored) ---
--- Executing GenerateAnswer Node ---



Processing chunks: 100%|██████████| 2/2 [00:00<00:00, 1343.68it/s]


processing graph instances:  33%|███▎      | 1/3 [00:19<00:38, 19.30s/it][A[A--- Executing Parse Node ---
--- Executing Parse Node ---
--- Executing RAG Node ---
--- (updated chunks metadata) ---
--- Executing RAG Node ---
--- (updated chunks metadata) ---
--- (tokens compressed and vector stored) ---
--- Executing GenerateAnswer Node ---



Processing chunks: 100%|██████████| 4/4 [00:00<00:00, 3430.92it/s]
--- (tokens compressed and vector stored) ---
--- Execut

In [None]:
print(json.dumps(result,indent=2))

{
  "Top 10 LMSYS LLMs": [
    {
      "Name": "GPT-4 Turbo",
      "Organization": "OpenAI",
      "Knowledge Cutoff": "December 2023",
      "License": "Proprietary (owned by OpenAI)",
      "How to Access": "ChatGPT Plus subscribers for $20 per month, update through Microsoft\u2019s Copilot",
      "Parameters Trained": "Estimated around 175 billion",
      "Key Features": [
        "Faster and more efficient",
        "Better at understanding context",
        "Versatile in tasks",
        "Focus on safety and ethics",
        "Learns from users"
      ]
    },
    {
      "Name": "Claude 3 Opus",
      "Organization": "Anthropic",
      "Knowledge Cutoff": "August 2023",
      "License": "Proprietary",
      "How to Access": "Talk to Claude 3 Opus for $20/month, access through Anthropic\u2019s API",
      "Parameters Trained": "Estimated to be within the same range as other large language models",
      "Key Features": [
        "Enhanced reasoning capabilities",
        "Multilin

# **Gemini Implementation**

Define the configuration for the graph for Gemini

In [None]:
graph_config_gemini = {
    "llm": {
        "api_key": GEMINI_API_KEY,
        "model": "gemini-pro",
    },
    "verbose":True
}

## **`SmartScraperGrapher`**

Rest is same.

In [None]:
from scrapegraphai.graphs import SmartScraperGraph

smart_scraper_graph = SmartScraperGraph(
    prompt="List me all the apps with their descriptions.",
    # also accepts a string with the already downloaded HTML code
    source="https://buildfastwithai.com/apps",
    config=graph_config
)

In [None]:
result = smart_scraper_graph.run()

--- Executing Fetch Node ---
--- Executing Parse Node ---
--- Executing RAG Node ---
--- (updated chunks metadata) ---
--- (tokens compressed and vector stored) ---
--- Executing GenerateAnswer Node ---
Processing chunks: 100%|██████████| 1/1 [00:00<00:00, 176.60it/s]


In [None]:
print(json.dumps(result, indent = 2))

{
  "apps": [
    {
      "name": "Chat with PDF",
      "description": "Chat with your PDFs. Free!!!"
    },
    {
      "name": "Gita GPT",
      "description": "Chat with lord krishna"
    },
    {
      "name": "Youtube Bot",
      "description": "Talk to youtube videos"
    },
    {
      "name": "Character AI",
      "description": "Talk to your favourite characters"
    },
    {
      "name": "GPT Data Analyst",
      "description": "Talk to your data"
    },
    {
      "name": "Interview Bot",
      "description": "Get interview questions answered"
    }
  ]
}
