### **APAN 5400 - Group 1 Term Project**
*   Authors: David Skorodinsky, Jasper Chen
*   Date: December 6 2024

### **Section 1: Create Company as a Spark Dataframe**

In [None]:
#!pip install -U pyspark

#### Run the block below to avoid Python and driver version mismatch

In [None]:
import os
import sys
os.environ['PYSPARK_PYTHON'] = sys.executable
os.environ['PYSPARK_DRIVER_PYTHON'] = sys.executable

In [None]:
os.environ['PYSPARK_PYTHON']
os.environ['PYSPARK_DRIVER_PYTHON']
!python3 -V

Python 3.10.12


#### Initiate and configure Spark Session and Context

In [None]:
from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("Intro to Apache Spark") \
    .config("spark.cores.max", "4") \
    .config('spark.executor.memory', '8G') \
    .config('spark.driver.maxResultSize', '8g') \
    .config('spark.kryoserializer.buffer.max', '512m') \
    .config("spark.driver.cores", "4") \
    .getOrCreate()

sc = spark.sparkContext

print("Using Apache Spark Version", spark.version)

Using Apache Spark Version 3.5.3


#### Read the Companies dataset into Spark DataFrame and print the count of records

In [None]:
companies_sdf = spark.read.option("header", "true") \
                   .option("delimiter", ",") \
                   .option("inferSchema", "true") \
                   .csv("US_stocks_clean.csv")
# Check the CSV has loaded correctly by counting number of records.  Should indicate 3144 companies
companies_sdf.count()

3144

In [None]:
# Check the CSV has loaded correctly by displaying the columns in the dataframe
companies_sdf.columns

['Ticker',
 'Description',
 'Company Name',
 'Sector',
 'Industry Group',
 'Industry',
 'Sub-Industry',
 'Comment']

#### Register the Companies DataFrame as a SQL Temporary View

In [None]:
companies_sdf.createOrReplaceTempView("companies")

In [None]:
# Retrieve a dataframe with all details of the search company from the Company DataFrame
def getCompanyDetails(company_name):
  sqlDF = spark.sql("SELECT `Company Name`, `Ticker`, `Description`, `Sector`, `Industry Group`, `Sub-Industry` FROM companies WHERE `Company Name` LIKE '%" + company_name.upper() + "%' LIMIT 1")
  if sqlDF.count() >0:

    # Convert sqlDF from dataframe to HTML
    html_table = sqlDF.toPandas().to_html()
    return html_table

    #return sqlDF
  else:
    return ""

In [None]:
# Test the function works by retrieving results for "Microsoft"
getCompanyDetails("Microsoft")

'<table border="1" class="dataframe">\n  <thead>\n    <tr style="text-align: right;">\n      <th></th>\n      <th>Company Name</th>\n      <th>Ticker</th>\n      <th>Description</th>\n      <th>Sector</th>\n      <th>Industry Group</th>\n      <th>Sub-Industry</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>MICROSOFT CORP</td>\n      <td>MSFT</td>\n      <td>Microsoft Corporation is a technology company. The Company develops and supports software, services, devices, and solutions. Its segments include Productivity and Business Processes, Intelligent Cloud, and More Personal Computing. The Productivity and Business Processes segment consists of products and services in its portfolio of productivity, communication, and information services, spanning a variety of devices and platforms. This segment includes Office Consumer, LinkedIn, dynamics business solutions, and Office Commercial. The Intelligent Cloud segment consists of public, private, and hybrid serve

In [None]:
# Retrieve the sub-industry of the search company from the Company DataFrame
def getCompanySubindustry(company_name):
  sqlDF = spark.sql("SELECT `Sub-Industry` FROM companies WHERE `Company Name` LIKE '%" + company_name.upper() + "%' LIMIT 1")
  print("First Result Only:")
  sqlDF.show()
  if sqlDF.count() >0:
    return sqlDF.first()['Sub-Industry']
  else:
    return ""

In [None]:
# Test the function works by retrieving results for "Microsoft"
getCompanySubindustry("Microsoft")

First Result Only:
+----------------+
|    Sub-Industry|
+----------------+
|Systems Software|
+----------------+



'Systems Software'

In [None]:
# Retrieve the description of the search company with stopwords removed
import spacy
def getCompanyDesc(company_name):
  sqlDF = spark.sql("SELECT `Description` FROM companies WHERE `Company Name` LIKE '%" + company_name.upper() + "%' LIMIT 1")
  print("First Result Only:")
  sqlDF.show()
  if sqlDF.count() >0:

    text = sqlDF.first()['Description']
    # Load spaCy English model and process the text using spaCy
    nlp = spacy.load("en_core_web_sm")
    doc = nlp(text)

    # Remove stopwords
    filtered_words = [token.text for token in doc if not token.is_stop]

    # Join the filtered words to form a clean text
    clean_text = ' '.join(filtered_words)
    #print("Company Description (after stopword removal):", clean_text)

    # De-duplicate the string
    deduped_text = " ".join(dict.fromkeys(clean_text.split()))
    print("Company Description (after stopword removal and de-duplication):", deduped_text)

    return deduped_text
  else:
    return ""

In [None]:
# Test the function works by retrieving results for "Microsoft"
getCompanyDesc("Microsoft")

First Result Only:
+--------------------+
|         Description|
+--------------------+
|Microsoft Corpora...|
+--------------------+

Company Description (after stopword removal and de-duplication): Microsoft Corporation technology company . Company develops supports software , services devices solutions segments include Productivity Business Processes Intelligent Cloud Personal Computing segment consists products portfolio productivity communication information spanning variety platforms includes Office Consumer LinkedIn dynamics business Commercial public private hybrid server cloud power modern businesses developers enterprise customers centre experience Windows gaming search news advertising


'Microsoft Corporation technology company . Company develops supports software , services devices solutions segments include Productivity Business Processes Intelligent Cloud Personal Computing segment consists products portfolio productivity communication information spanning variety platforms includes Office Consumer LinkedIn dynamics business Commercial public private hybrid server cloud power modern businesses developers enterprise customers centre experience Windows gaming search news advertising'

#### Regex Tokenizer and Stopword Filter

In [None]:
from pyspark.ml.feature import RegexTokenizer, StopWordsRemover, Word2Vec
regexTokFilter = RegexTokenizer(gaps = False, pattern = '\\w+', inputCol = 'Description', outputCol = 'Tokens')
stopwordFilter = StopWordsRemover(inputCol = 'Tokens', outputCol = 'Tokens SW Removed')

In [None]:
companies_sdf_tok = regexTokFilter.transform(companies_sdf)
companies_sdf_swr = stopwordFilter.transform(companies_sdf_tok)
companies_sdf_subset = companies_sdf_swr.limit(40000)
companies_sdf_subset['Ticker','Company Name','Description','Tokens','Tokens SW Removed'].show()

+------+--------------------+--------------------+--------------------+--------------------+
|Ticker|        Company Name|         Description|              Tokens|   Tokens SW Removed|
+------+--------------------+--------------------+--------------------+--------------------+
|  CTVA|         CORTEVA INC|Corteva, Inc. is ...|[corteva, inc, is...|[corteva, inc, gl...|
|  ALCO|           ALICO INC|Alico, Inc. is an...|[alico, inc, is, ...|[alico, inc, agri...|
|  LMNR|        LIMONEIRA CO|Limoneira Company...|[limoneira, compa...|[limoneira, compa...|
|  SANW|         S&W SEED CO|S&W Seed Company ...|[s, w, seed, comp...|[w, seed, company...|
|   TRC|      TEJON RANCH CO|Tejon Ranch Co. i...|[tejon, ranch, co...|[tejon, ranch, co...|
|  CALM| CAL-MAINE FOODS INC|Cal-Maine Foods, ...|[cal, maine, food...|[cal, maine, food...|
|    BV| BRIGHTVIEW HOLDINGS|BrightView Holdin...|[brightview, hold...|[brightview, hold...|
|   CLF|CLEVELAND-CLIFFS INC|Cleveland-Cliffs ...|[cleveland, cliff...

#### Train Word2Vec model

In [None]:
word2vec = Word2Vec(vectorSize = 300, minCount = 5, inputCol = 'Tokens SW Removed', outputCol = 'Word Vectors')
model = word2vec.fit(companies_sdf_subset)
wordvectors = model.transform(companies_sdf_subset)
companies_sdf_w2v = wordvectors.select('Ticker','Company Name','Description','Word Vectors').rdd.toDF()
companies_sdf_w2v.show()

+------+--------------------+--------------------+--------------------+
|Ticker|        Company Name|         Description|        Word Vectors|
+------+--------------------+--------------------+--------------------+
|  CTVA|         CORTEVA INC|Corteva, Inc. is ...|[0.02270292518791...|
|  ALCO|           ALICO INC|Alico, Inc. is an...|[0.00728729476488...|
|  LMNR|        LIMONEIRA CO|Limoneira Company...|[0.02068193188752...|
|  SANW|         S&W SEED CO|S&W Seed Company ...|[0.00167447168268...|
|   TRC|      TEJON RANCH CO|Tejon Ranch Co. i...|[0.02104608487570...|
|  CALM| CAL-MAINE FOODS INC|Cal-Maine Foods, ...|[0.00471268486275...|
|    BV| BRIGHTVIEW HOLDINGS|BrightView Holdin...|[0.02837199383300...|
|   CLF|CLEVELAND-CLIFFS INC|Cleveland-Cliffs ...|[0.00986527417934...|
|   FCX|FREEPORT-MCMORAN INC|Freeport-McMoRan ...|[0.00493158789543...|
|   XPL| SOLITARIO ZINC CORP|Solitario Zinc Co...|[-2.8045181135990...|
|  SCCO|SOUTHERN COPPER CORP|Southern Copper C...|[0.00560024190

#### Show synonyms of a selected word

In [None]:
synonyms = model.findSynonyms("health", 20)
synonyms.show()

+------------+------------------+
|        word|        similarity|
+------------+------------------+
|supplemental|0.8860935568809509|
|    accident|0.8362582921981812|
|        life|0.7563870549201965|
|     writing| 0.746462881565094|
|     annuity|0.7455814480781555|
|   voluntary|0.7429248690605164|
|  disability|0.7419498562812805|
|    personal| 0.736068606376648|
|    benefits|0.7342462539672852|
|   annuities|0.7281482815742493|
|    coverage|0.7253290414810181|
|   insurance|0.7221017479896545|
|        care|0.7189557552337646|
|     hospice|0.7019994854927063|
|     medical| 0.700878381729126|
|   liability|0.6906301975250244|
|  retirement|0.6871844530105591|
|    allstate|0.6810792088508606|
|       needs|0.6792890429496765|
|       plans|0.6724750995635986|
+------------+------------------+



In [None]:
companies_sdf_w2v_final = companies_sdf_w2v.collect()

#### Retrieve companies with descriptions similar (cosine) to input query

In [None]:
import numpy as np

def cossim(v1, v2):
    return np.dot(v1, v2) / np.sqrt(np.dot(v1, v1)) / (np.sqrt(np.dot(v2, v2))+.1)

In [None]:
# Define a function to retrieve a list of similar companies based on the search company
import pandas as pd
def getSimCompanies(query_txt):
  query_df  = sc.parallelize([(1,query_txt)]).toDF(['index','Description'])
  query_tok = regexTokFilter.transform(query_df)
  query_swr = stopwordFilter.transform(query_tok)
  query_vec = model.transform(query_swr)
  query_vec = query_vec.select('Word Vectors').collect()[0][0]

  sim_rdd = sc.parallelize((i[0], i[1], i[2], float(cossim(query_vec, i[3]))) for i in companies_sdf_w2v_final)
  sim_df  = spark.createDataFrame(sim_rdd).\
                  withColumnRenamed('_1', 'Ticker').\
                  withColumnRenamed('_2', 'Company Name').\
                  withColumnRenamed('_3', 'Description').\
                  withColumnRenamed('_4', 'Similarity').\
                  orderBy("Similarity", ascending = False)
  #sim_df.show()

  pd.set_option('display.max_colwidth', 0)
  sim_pd = sim_df.toPandas().head(10)

  return sim_pd.to_html()

In [None]:
# Test the results - search for similar companies
getSimCompanies("Energy")

'<table border="1" class="dataframe">\n  <thead>\n    <tr style="text-align: right;">\n      <th></th>\n      <th>Ticker</th>\n      <th>Company Name</th>\n      <th>Description</th>\n      <th>Similarity</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>MTZ</td>\n      <td>MASTEC INC</td>\n      <td>MasTec, Inc. is an infrastructure construction company. The Company\'s segments include Communications, Clean Energy and Infrastructure, Oil and Gas, and Power Delivery. The Communications segment provides engineering, construction, maintenance, and customer fulfillment activities related to communications infrastructure. The Clean Energy and Infrastructure segment serves energy, utility, Government and other end-markets through the installation and construction of power generation facilities, primarily from clean energy and renewable sources. Oil and Gas segment provides engineering, construction and maintenance services for pipelines and processing facilities 

In [None]:
# Test the result when we use a longer search string based on a company description
getSimCompanies("Microsoft Corporation technology company . Company develops supports software , services devices solutions segments include Productivity Business Processes Intelligent Cloud Personal Computing segment consists products portfolio productivity communication information spanning variety platforms includes Office Consumer LinkedIn dynamics business Commercial public private hybrid server cloud power modern businesses developers enterprise customers centre experience Windows gaming search news advertising")

'<table border="1" class="dataframe">\n  <thead>\n    <tr style="text-align: right;">\n      <th></th>\n      <th>Ticker</th>\n      <th>Company Name</th>\n      <th>Description</th>\n      <th>Similarity</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>MSI</td>\n      <td>MOTOROLA SOLUTIONS INC</td>\n      <td>Motorola Solutions, Inc. provides communications and analytics solutions. The Company provides land mobile radio communications (LMR), video security and access control and command center software, video security and analytics, supported by managed and support services. Its segments include Products and Systems Integration Segment, and Software and Services Segment. The Products and Systems Integration segment offers a portfolio of infrastructure, devices, accessories and video security devices, including LMR, public safety long term evolution (LTE) and private LTE, as well as network video management infrastructure, fixed security and mobile video c

### **Section 2: Establishing NewsAPI Connection**

In [None]:
!pip3 install -U pymongo

from datetime import date, timedelta
import requests
import json
from pymongo import MongoClient


Collecting pymongo
  Downloading pymongo-4.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (22 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo)
  Downloading dnspython-2.7.0-py3-none-any.whl.metadata (5.8 kB)
Downloading pymongo-4.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m18.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dnspython-2.7.0-py3-none-any.whl (313 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m313.6/313.6 kB[0m [31m16.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: dnspython, pymongo
Successfully installed dnspython-2.7.0 pymongo-4.10.1


#### Setting up NewsAPI

In [None]:
# Calculate date range (Free API allows for 30 days)
date_30_days_ago = (date.today() - timedelta(days=30)).strftime("%Y-%m-%d")

#Create URL for API
base_url = 'https://newsapi.org/v2/everything'
api_key = 'c5d440bf1024467d9d579a297687552f'

In [None]:
# Combine data from first 5 pages & set other parameters
all_articles = []
for page in range(1, 6):
    params = {
        'q': 'news OR market OR technology OR business',
        'sortBy': 'popularity',
        'from': date_30_days_ago,
        'apiKey': api_key,
        'page': page, # Free API allows for 5 pages (500 articles)
        'excludeDomains': 'yahoo.com'
    }
    response = requests.get(base_url, params=params)
    if response.status_code == 200:
        articles = response.json().get('articles', [])
        all_articles.extend(articles)
    else:
        print(f"Failed page: {page}. Code: {response.status_code}")
        break

In [None]:
# Save API data to NewsData json file
output_file = "NewsData.json"

# Save data to JSON file
news_dataset = {"articles": all_articles}
with open(output_file, 'w', encoding='utf-8') as json_file:
    json.dump(news_dataset, json_file, indent=1)

#### Storing NewsAPI data in MongoDB

In [None]:
#collection.drop() #Use if running multiple times in Collab

# MongoDB Connection
client = MongoClient('mongodb+srv://ds4010:123654@apancluster.yqrha.mongodb.net/apan5400?retryWrites=true&w=majority&appName=ApanCluster')
db = client.apan5400
collection = db.News

In [None]:
# Insert data into MongoDB
if all_articles:
    collection.insert_many(all_articles)
    print(f"Inserted {len(all_articles)} articles.")
else:
    print("No articles found.")

Inserted 500 articles.


### **Section 3: Build Flask Page (to run on Google Colab)**

In [None]:
from google.colab.output import eval_js
print(eval_js("google.colab.kernel.proxyPort(5000)"))

https://aa1cqxx3k45-496ff2e9c6d22116-5000-colab.googleusercontent.com/


#### Build an interactive page with an input box.

In [None]:
from flask import Flask, request, render_template
#app = Flask(__name__, template_folder='drive/My Drive/APAN 5400 Term Project/Templates')
app = Flask(__name__, template_folder='')

@app.route("/")
def my_form():
    return render_template("my-form.html")

@app.route('/', methods=["GET", "POST"])
def my_form_post():
    news_results = ""
    sim_companies = ""
    if request.method == "POST":
      val = request.form['userinput']
      #News Results
      query = {"title": {"$regex": val, "$options": "i"}}  # Case-insensitive
      results = collection.find(query, {"title": 1, "url": 1, "_id": 0})  # Fetch title and URL
      news_results = [{"title": doc['title'], "URL": doc['url']} for doc in results]
      formatted_news_results = "<br><br>".join([f"{item['title']}: <a href='{item['URL']}'>Read more</a>" for item in news_results])
      news_results = formatted_news_results

      # Method 1: Search the Company DataFrame to get the Sub-Industry, then get similar companies
      # sub_ind = getCompanySubindustry(val)
      # if sub_ind != "": # Catch exceptions where there are no search results
      #   sim_companies = getSimCompanies(sub_ind)

      # Method 2: Search the Company DataFrame to get the Company Desc (with stopwords removed), then get similar companies
      comp_desc = getCompanyDesc(val)
      if comp_desc != "": # Catch exceptions where there are no search results
        sim_companies = getSimCompanies(comp_desc)

    # Render the HTML template feeding in the news results and list of similar companies as inputs
    return render_template('my-form.html', search_results = getCompanyDetails(val), company_news=news_results, df_html = sim_companies)

if __name__ == "__main__":
    app.run()

 * Serving Flask app '__main__'
 * Debug mode: off


 * Running on http://127.0.0.1:5000
INFO:werkzeug:[33mPress CTRL+C to quit[0m
INFO:werkzeug:127.0.0.1 - - [05/Dec/2024 00:59:10] "GET /?authuser=0 HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [05/Dec/2024 00:59:10] "[33mGET /favicon.ico?authuser=0 HTTP/1.1[0m" 404 -


First Result Only:
+--------------------+
|         Description|
+--------------------+
|Microsoft Corpora...|
+--------------------+

Company Description (after stopword removal and de-duplication): Microsoft Corporation technology company . Company develops supports software , services devices solutions segments include Productivity Business Processes Intelligent Cloud Personal Computing segment consists products portfolio productivity communication information spanning variety platforms includes Office Consumer LinkedIn dynamics business Commercial public private hybrid server cloud power modern businesses developers enterprise customers centre experience Windows gaming search news advertising


INFO:werkzeug:127.0.0.1 - - [05/Dec/2024 00:59:24] "POST /?authuser=0 HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [05/Dec/2024 00:59:24] "[33mGET /favicon.ico?authuser=0 HTTP/1.1[0m" 404 -


First Result Only:
+--------------------+
|         Description|
+--------------------+
|The Coca-Cola Com...|
+--------------------+

Company Description (after stopword removal and de-duplication): Coca - Cola Company beverage company . segments include Europe , Middle East Africa ; Latin America North Asia Pacific Global Ventures Bottling Investments owns licenses markets brands grouped categories sparkling flavors hydration sports coffee tea nutrition juice dairy plant based beverages emerging nonalcoholic soft drink Sprite Fanta Diet Coke Zero Sugar quarius Ayataka BODYARMOR Ciel Costa dogadan Dasani FUZE TEA Georgia glaceau smartwater vitaminwater Gold Peak Powerade AdeS Del Valle fairlife innocent Minute Maid Pulpy Simply products available consumers 200 countries


INFO:werkzeug:127.0.0.1 - - [05/Dec/2024 00:59:58] "POST /?authuser=0 HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [05/Dec/2024 00:59:58] "[33mGET /favicon.ico?authuser=0 HTTP/1.1[0m" 404 -


First Result Only:
+--------------------+
|         Description|
+--------------------+
|UnitedHealth Grou...|
+--------------------+

Company Description (after stopword removal and de-duplication): UnitedHealth Group Incorporated diversified health care company operates Optum UnitedHealthcare platforms . Company segments include Health , Insight Rx provides wellness addressing physical emotional - related financial needs national delivery platform engages people settings including clinical sites home virtual serves systems physicians hospital plans state governments life sciences companies range pharmacy services retail pharmacies specialty community based infusion segment includes Employer & Individual Medicare Retirement Community State Global


INFO:werkzeug:127.0.0.1 - - [05/Dec/2024 01:00:11] "POST /?authuser=0 HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [05/Dec/2024 01:00:11] "[33mGET /favicon.ico?authuser=0 HTTP/1.1[0m" 404 -


First Result Only:
+--------------------+
|         Description|
+--------------------+
|Chevron Corporati...|
+--------------------+

Company Description (after stopword removal and de-duplication): Chevron Corporation manages investments subsidiaries affiliates , provides administrative financial management technology support United States international engage integrated energy chemicals operations . Company operates business segments : Upstream Downstream segment consists primarily exploring developing producing crude oil natural gas ; processing liquefaction transportation regasification associated liquefied transporting export pipelines storage marketing - liquids plant refining petroleum products refined lubricants manufacturing renewable fuels commodity petrochemicals


INFO:werkzeug:127.0.0.1 - - [05/Dec/2024 01:00:29] "POST /?authuser=0 HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [05/Dec/2024 01:00:29] "[33mGET /favicon.ico?authuser=0 HTTP/1.1[0m" 404 -


First Result Only:
+--------------------+
|         Description|
+--------------------+
|Amazon.com, Inc. ...|
+--------------------+

Company Description (after stopword removal and de-duplication): Amazon.com , Inc. provides range products services customers . offered stores include merchandise content purchased resale - party sellers manufactures sells electronic devices including Kindle Fire tablet TV Echo Ring develops produces media operates segments : North America International Amazon Web Services ( AWS ) segment consists global sales compute storage database start ups enterprises government agencies academic institutions advertising vendors publishers authors programs sponsored advertisements display video serves consumers online physical Customers access offerings websites mobile applications Alexa streaming physically visiting


INFO:werkzeug:127.0.0.1 - - [05/Dec/2024 01:23:37] "POST /?authuser=0 HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [05/Dec/2024 01:23:37] "[33mGET /favicon.ico?authuser=0 HTTP/1.1[0m" 404 -


First Result Only:
+--------------------+
|         Description|
+--------------------+
|The Coca-Cola Com...|
+--------------------+

Company Description (after stopword removal and de-duplication): Coca - Cola Company beverage company . segments include Europe , Middle East Africa ; Latin America North Asia Pacific Global Ventures Bottling Investments owns licenses markets brands grouped categories sparkling flavors hydration sports coffee tea nutrition juice dairy plant based beverages emerging nonalcoholic soft drink Sprite Fanta Diet Coke Zero Sugar quarius Ayataka BODYARMOR Ciel Costa dogadan Dasani FUZE TEA Georgia glaceau smartwater vitaminwater Gold Peak Powerade AdeS Del Valle fairlife innocent Minute Maid Pulpy Simply products available consumers 200 countries


INFO:werkzeug:127.0.0.1 - - [05/Dec/2024 01:25:17] "POST /?authuser=0 HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [05/Dec/2024 01:25:17] "[33mGET /favicon.ico?authuser=0 HTTP/1.1[0m" 404 -


First Result Only:
+--------------------+
|         Description|
+--------------------+
|Chevron Corporati...|
+--------------------+

Company Description (after stopword removal and de-duplication): Chevron Corporation manages investments subsidiaries affiliates , provides administrative financial management technology support United States international engage integrated energy chemicals operations . Company operates business segments : Upstream Downstream segment consists primarily exploring developing producing crude oil natural gas ; processing liquefaction transportation regasification associated liquefied transporting export pipelines storage marketing - liquids plant refining petroleum products refined lubricants manufacturing renewable fuels commodity petrochemicals


INFO:werkzeug:127.0.0.1 - - [05/Dec/2024 01:25:51] "POST /?authuser=0 HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [05/Dec/2024 01:25:51] "[33mGET /favicon.ico?authuser=0 HTTP/1.1[0m" 404 -


First Result Only:
+-----------+
|Description|
+-----------+
+-----------+



INFO:werkzeug:127.0.0.1 - - [05/Dec/2024 01:26:15] "POST /?authuser=0 HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [05/Dec/2024 01:26:15] "[33mGET /favicon.ico?authuser=0 HTTP/1.1[0m" 404 -


First Result Only:
+--------------------+
|         Description|
+--------------------+
|Alphabet Inc. is ...|
+--------------------+

Company Description (after stopword removal and de-duplication): Alphabet Inc. holding company . Company segments include Google Services , Cloud Bets segment includes products services ads Android Chrome hardware Maps Play Search YouTube infrastructure platform collaboration tools enterprise customers earlier stage technologies afield core business sale health technology Internet provides - ready cloud including Platform Workspace enables developers build test deploy applications Gmail Docs Drive Calendar Meet


INFO:werkzeug:127.0.0.1 - - [05/Dec/2024 01:26:30] "POST /?authuser=0 HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [05/Dec/2024 01:26:30] "[33mGET /favicon.ico?authuser=0 HTTP/1.1[0m" 404 -
