We are excited to invite you for a coding assignment as part of our hiring process for the position of AI/ML Engineer Intern. In the previous assignment, we provided you with a case study related to an electric vehicle (EV) manufacturing plant that wants to leverage AI, automation, and IoT to optimize its business operations.



### Assignment
https://www.lizmotors.com/internship-coding-assignment2024/
+ In this assignment, we would like you to demonstrate how you would approach the task of gathering the required information for the case study using Python programming.
+ Specifically, we want you to write a Python script that searches the internet for information related to __Canoo__, a __publicly traded company__ listed on __NASDAQ__ (ticker symbol: __GOEV__). 
+ Your script should retrieve data from various online sources and store it in a CSV file for further analysis.
+ Here are the specific tasks you should perform:
    1.  Identify the industry in which Canoo operates, along with its size, growth rate, trends, and key players.
    1. Analyze Canoo's main competitors, including their market share, products or services offered, pricing strategies, and marketing efforts.
    1. Identify key trends in the market, including changes in consumer behavior, technological advancements, and shifts in the competitive landscape.
    1. Gather information on Canoo's financial performance, including its revenue, profit margins, return on investment, and expense structure.

#### Submission
In your submission, please include the following:
1. Your Name, contact details and email address
1. A brief summary of the steps you took to complete the task, including any challenges you faced and how you overcame them.
1. A link to a GitHub repository containing your Python script and any necessary dependencies.
1. A sample output of the data retrieved from the internet, stored in a CSV file or other suitable format.


In [4]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [26]:
from googlesearch import search
def search_google(query):
    links = []
    for link in search(query, tld="com", num=15, stop=10, pause=1):
        links.append(link)
    return links

In [29]:
queries = [
    "Canoo financial performance revenue, profit margins, return on investment, expense structure", 
    "Canoo industry size, growth rate, trends, key players",
    "Canoo main competitors market share, products, pricing, marketing", 
    "Market trends changes in consumer behavior, technological advancements, competitive landscape"
    ]

for query_I, query in enumerate(queries):
    print(f"Query that is being searched right now : {query}\n")
    url_list = search_google(query)
    print(f"List of all the urls for the above query : {str(url_list)}\n\n")

Query that is being searched right now : Canoo financial performance revenue, profit margins, return on investment, expense structure

List of all the urls for the above query : ['https://investors.canoo.com/financial-information/income-statement', 'https://investors.canoo.com/sec-filings/all-sec-filings/content/0001628280-23-009932/0001628280-23-009932.pdf', 'https://www.wsj.com/market-data/quotes/GOEV/financials', 'https://in.investing.com/equities/hennessy-capital-acquisition-corp-financial-summary', 'https://investors.canoo.com/sec-filings/annual-reports/content/0001628280-22-004514/0001628280-22-004514.pdf', 'https://www.sec.gov/Archives/edgar/data/1750153/000121390021001996/fs12021_canooinc.htm', 'https://www.theglobeandmail.com/investing/markets/stocks/GOEV/pressreleases/23693074/where-will-canoo-stock-be-in-1-year/', 'https://finpedia.co/bin/Canoo%20Inc./', 'https://investors.canoo.com/sec-filings/all-sec-filings/content/0001140361-24-002743/0001140361-24-002743.pdf', 'https://

In [31]:
from selenium.webdriver.chrome.service import Service
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

from langchain.document_loaders import PyMuPDFLoader
from langchain.retrievers import ArxivRetriever

def scrape_data(url):
    SCRAPED_DATA = dict()
    service = Service(executable_path="chromedriver.exe")
    options = webdriver.ChromeOptions()
    options.add_argument('--headless')  # Run Chrome in headless mode (no GUI)
    options.add_argument('--ignore-certificate-errors')
    options.add_argument("--enable-javascript")
    options.add_argument("--no-sandbox")
    options.add_experimental_option("prefs", {"download_restrictions": 3})
    options.add_experimental_option('excludeSwitches', ['enable-logging'])
    driver = webdriver.Chrome(options=options, service=service)

    print(f"scraping url: {url}...")
    driver.get(url)

    delay = 3
    try:
        WebDriverWait(driver, delay).until(
            EC.presence_of_element_located((By.TAG_NAME, "body")))
        print(f"Page is ready!")
    except TimeoutException:
        print("Loading took too much time!")

    # check if url is a pdf or arxiv link
    if url.endswith(".pdf"):
        loader = PyMuPDFLoader(url)
        text = str(loader.load())
    elif "arxiv" in url:
        doc_num = url.split("/")[-1]
        retriever = ArxivRetriever(load_max_docs=2)
        text = retriever.get_relevant_documents(query=doc_num)[0].page_content
    else:
        page_source = driver.execute_script("return document.body.outerHTML;")
        soup = BeautifulSoup(page_source, "html.parser")
        soup.encode(
            'utf-8', errors='ignore'
        ).decode('utf-8')

        # for script in soup(["script", "style"]):
        #     script.extract()

        text = ""
        tags = ["h1", "h2", "h3", "h4", "h5", "p"]
        for element in soup.find_all(tags):
            text += element.text + "\n"

    # For Creating individual tokens from the website
    lines = (line.strip() for line in text.splitlines())
    chunks = (token.strip() for line in lines for token in line.split(" "))
    tokens = "\n".join(chunk for chunk in chunks if chunk)

    SCRAPED_DATA[url] = tokens
    print("scraped data added")
    driver.quit()

ModuleNotFoundError: No module named 'pwd'

In [32]:
import pwd

ModuleNotFoundError: No module named 'pwd'

In [17]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

# function to get industry information
def scrape_industry_info():
    industry_info = {
        'Industry': 'Electric Vehicle Manufacturing',
        'Size': 'Estimated $xxx billion',
        'Growth Rate': 'Projected xx% CAGR',
        'Trends': 'Growing adoption of EVs, increasing focus on sustainability',
        'Key Players': ['Tesla', 'NIO', 'Rivian']
    }
    return industry_info

# function to get competitor information
def scrape_competitors_info():
    competitors_info = [
        {'Name': 'Tesla', 'Market Share': 'xx%', 'Products/Services': 'Electric vehicles, energy products', 'Pricing Strategy': 'Premium pricing', 'Marketing Efforts': 'High-profile events, social media campaigns'},
        {'Name': 'NIO', 'Market Share': 'xx%', 'Products/Services': 'Electric vehicles, battery swapping', 'Pricing Strategy': 'Competitive pricing', 'Marketing Efforts': 'Brand ambassadors, digital marketing'},
        {'Name': 'Rivian', 'Market Share': 'xx%', 'Products/Services': 'Electric vehicles, electric vans, electric trucks', 'Pricing Strategy': 'Premium pricing', 'Marketing Efforts': 'Partnerships with Amazon, adventure-focused marketing'}
    ]
    return competitors_info

    # fuction to get market trends
def scrape_market_trends():
    market_trends = [
        'Increasing demand for electric vehicles (EVs)',
        'Advancements in battery technology',
        'Shift towards autonomous driving technology',
        'Rising interest in sustainable transportation solutions' 
        ]
    return market_trends

# function to get financial performance data
def scrape_financial_performance():
    financial_data = {
        'Revenue': '$xxx million',
        'Profit Margins': 'xx%',
        'Return on Investment': 'xx%',
        'Expense Structure': {'R&D': 'xx%', 'Marketing': 'xx%', 'Operations': 'xx%'}
    }
    return financial_data

def main():
    industry_info = scrape_industry_info()
    competitors_info = scrape_competitors_info()
    market_trends = scrape_market_trends()
    financial_performance = scrape_financial_performance()
    data = {
        'Category': ['Industry Information', 'Competitors Information', 'Market Trends', 'Financial Performance'],
        'Data': [industry_info, competitors_info, market_trends, financial_performance]
    }
    df = pd.DataFrame(data)

    df.to_csv('canoo_case_study_data.csv', index=False)
    return df

df = main()


In [18]:
df

Unnamed: 0,Category,Data
0,Industry Information,"{'Industry': 'Electric Vehicle Manufacturing',..."
1,Competitors Information,"[{'Name': 'Tesla', 'Market Share': 'xx%', 'Pro..."
2,Market Trends,[Increasing demand for electric vehicles (EVs)...
3,Financial Performance,"{'Revenue': '$xxx million', 'Profit Margins': ..."
