# Google News Chatbot

### Introduction

In this lesson, we will build a chatbot using the Google News API.  To do so, we'll use the [Python Google News library](https://pypi.org/project/GoogleNews/).

### Getting situated

You can get an overview for how the codebase works by looking at the `console.py`.  You will have to fill in your openai api key in the .env file.

### Build the newscraper

* Navigate to the `scraper/news_scraper.py` file, and complete the following functions.

* `get_links(search_term, time_period)`
    * Use the GoogleNews library to return search results for news from the past day.  The return value should be a list of urls from the search results.  
    * It should take in the search_term as the first argument and `time_period` should have a default argument of being set to one day.
    * Pass the corresponding test.
    
* clean_link
    * If you try to go to many of the links returned, you'll see that they do not work.  The issue is that they have additional text tacked on at the end.
    * `'https://phillysportsnetwork.com/2024/01/22/embiid-sixers-scoring-record/&ved=2ahUKEwjNvqH0z6yEAxX7SjABHXjuDgsQxfQBegQIARAC&usg=AOvVaw1Dc4SkSkyLvC9nf7Gh83GY'`
    * So write a `clean_link` function, that will take in the returned link, and then only return the text before the ampersand.  (See the test).
    
* `get_text_from_link`
    * We filled this code out for you.  You can look at the input and output to see the work performed.  Because making a request to a webpage returns html text, the function uses beautiful soup to extract the text, and combine it together.  It does so, so long as the text is not in a css or javascript tag.
    
* clean_text(html_text)
    * The resulting text still may have leading or trailing white spaces, new line characters (\n) or carriage characters (\r).  Replace the new line characters with a single space, and remove the carriage characters and whitespace completely (see the test).
    * Slice text - finally this function should only return the first 15_000 characters from the string.  The reason for shortening this is because we can only pass so many tokens (text) to our ml model.
* build_sources_from(question)
    * This function ties the other functions together.  Given the text of a question, it returns a list of Source instances (see the models folder).  Each source instance should have the url and the cleaned text set as the text attribute.

### LLM Model

* `build_context`
    * We'll now want to take the list of source instances, and concatenate the text from each of the sources together.  Separate the text between each source with two new line characters.  (See the test).

* generate_prompt
    * You can see that the `generate_prompt` function takes the list of sources, uses the `build_context` function to build the combined string, and then adds it to an initial prompt.
    
* get_answer
    * This function then passes the prompt to the model.


### Try it out

Ok, so now run the `console.py` file and see this in action.