# JOB SCRAPER

* #### This notebook automates the extraction, cleaning, and transformation of job postings data from a specified website using Selenium WebDriver.
* #### It then uploads the processed data to GitHub, providing a streamlined pipeline for data acquisition and storage.

### >>

## This notebook performs the following tasks:

* ####  Installs necessary dependencies.
* ####  Downloads and installs Chrome and Chromedriver.
* ####  Installs necessary Python packages.
* ####  Imports required libraries.
* ####  Initializes Selenium WebDriver.
* ####  Defines functions for configuring Selenium WebDriver, scraping job details, getting GitHub repository, uploading data to GitHub, sending email notification, extracting data from GitHub, and the main function.
* ####  Calls the main function to execute the entire process.

### >>

## **Section 1** - Installation and Setup

#### This section covers the installation of necessary dependencies, downloading and installing Chrome and Chromedriver, installing required Python packages, and importing libraries.

## Update Linux Dependencies

In [1]:
!apt-get update -y
!apt-get install -y \
libglib2.0-0 \
libnss3 \
libdbus-glib-1-2 \
libgconf-2-4 \
libfontconfig1 \
libvulkan1 \
gconf2-common \
libwayland-server0 \
libgbm1 \
udev \
libu2f-udev 
!apt --fix-broken install -y  

Get:1 https://packages.cloud.google.com/apt cloud-sdk InRelease [6361 B]
Hit:2 http://archive.ubuntu.com/ubuntu focal InRelease                         
Get:3 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]        
Get:4 https://packages.cloud.google.com/apt cloud-sdk/main amd64 Packages [617 kB]
Get:5 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]      
Get:6 http://packages.cloud.google.com/apt gcsfuse-focal InRelease [1225 B]    
Get:7 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]      
Get:8 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [3563 kB]
Get:9 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [1194 kB]
Get:10 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [32.4 kB]
Get:11 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [3959 kB]
Get:12 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1489 kB]
Get:

## Installing Chrome

In [2]:
!wget -P /tmp https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/116.0.5845.96/linux64/chrome-linux64.zip
!unzip /tmp/chrome-linux64.zip -d /usr/bin/

--2024-03-19 08:41:14--  https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/116.0.5845.96/linux64/chrome-linux64.zip
Resolving edgedl.me.gvt1.com (edgedl.me.gvt1.com)... 34.104.35.123, 2600:1900:4110:86f::
Connecting to edgedl.me.gvt1.com (edgedl.me.gvt1.com)|34.104.35.123|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 145898081 (139M) [application/octet-stream]
Saving to: ‘/tmp/chrome-linux64.zip’


2024-03-19 08:41:16 (63.5 MB/s) - ‘/tmp/chrome-linux64.zip’ saved [145898081/145898081]

Archive:  /tmp/chrome-linux64.zip
  inflating: /usr/bin/chrome-linux64/MEIPreload/manifest.json  
  inflating: /usr/bin/chrome-linux64/MEIPreload/preloaded_data.pb  
  inflating: /usr/bin/chrome-linux64/chrome  
  inflating: /usr/bin/chrome-linux64/chrome-wrapper  
  inflating: /usr/bin/chrome-linux64/chrome_100_percent.pak  
  inflating: /usr/bin/chrome-linux64/chrome_200_percent.pak  
  inflating: /usr/bin/chrome-linux64/chrome_crashpad_handler  
  inflating: /usr/

In [3]:
!/usr/bin/chrome-linux64/chrome --version

Google Chrome for Testing 116.0.5845.96 


## Install ChromeDriver

In [4]:
!wget -P /tmp https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/116.0.5845.96/linux64/chromedriver-linux64.zip
!unzip /tmp/chromedriver-linux64.zip -d /usr/bin/

--2024-03-19 08:41:24--  https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/116.0.5845.96/linux64/chromedriver-linux64.zip
Resolving edgedl.me.gvt1.com (edgedl.me.gvt1.com)... 34.104.35.123, 2600:1900:4110:86f::
Connecting to edgedl.me.gvt1.com (edgedl.me.gvt1.com)|34.104.35.123|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7271942 (6.9M) [application/octet-stream]
Saving to: ‘/tmp/chromedriver-linux64.zip’


2024-03-19 08:41:24 (103 MB/s) - ‘/tmp/chromedriver-linux64.zip’ saved [7271942/7271942]

Archive:  /tmp/chromedriver-linux64.zip
  inflating: /usr/bin/chromedriver-linux64/LICENSE.chromedriver  
  inflating: /usr/bin/chromedriver-linux64/chromedriver  


In [5]:
!/usr/bin/chromedriver-linux64/chromedriver --version

ChromeDriver 116.0.5845.96 (1a391816688002153ef791ffe60d9e899a71a037-refs/branch-heads/5845@{#1382})


## Install PyGithub

In [6]:
!apt install -y python3-selenium
!pip install selenium==3.141.0

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  apparmor chromium-browser chromium-chromedriver liblzo2-2 snapd
  squashfs-tools
Suggested packages:
  apparmor-profiles-extra apparmor-utils zenity | kdialog
The following NEW packages will be installed:
  apparmor chromium-browser chromium-chromedriver liblzo2-2 python3-selenium
  snapd squashfs-tools
0 upgraded, 7 newly installed, 0 to remove and 126 not upgraded.
Need to get 38.7 MB of archives.
After this operation, 174 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 apparmor amd64 2.13.3-7ubuntu5.3 [502 kB]
Get:2 http://archive.ubuntu.com/ubuntu focal/main amd64 liblzo2-2 amd64 2.10-2 [50.8 kB]
Get:3 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 squashfs-tools amd64 1:4.4-1ubuntu0.3 [117 kB]
Get:4 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 snapd am

## Install Selenium

In [7]:
!pip install PyGithub

Collecting PyGithub
  Downloading PyGithub-2.2.0-py3-none-any.whl (350 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m350.2/350.2 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hCollecting pynacl>=1.4.0 (from PyGithub)
  Downloading PyNaCl-1.5.0-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (856 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m856.7/856.7 kB[0m [31m37.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pynacl, PyGithub
Successfully installed PyGithub-2.2.0 pynacl-1.5.0


## Import Libraries

In [8]:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from github import Github
import datetime
import smtplib
from email.message import EmailMessage
import pytz
import pandas as pd
import json
import github
from kaggle_secrets import UserSecretsClient

### >>

## **Section 2** - Selenium WebDriver Initialization

#### Here, we initialize the Selenium WebDriver and define functions related to its configuration.

In [9]:
# Add configurable options
def add_driver_options(options):
    chrome_options = Options()
    for opt in options:
        chrome_options.add_argument(opt)
    return chrome_options

# Function to get WebDriver object
def get_driver() -> webdriver.Chrome:
    driver_config = {
        "options": [
            "--headless",
            "--no-sandbox",
            "--start-fullscreen",
            "--allow-insecure-localhost",
            "--disable-dev-shm-usage",
            "user-agent=Chrome/116.0.5845.96"
        ],
    }
    CHROME_BINARY_LOCATION = "/usr/bin/chrome-linux64/chrome"
    CHROMEDRIVER_BINARY_LOCATION = "/usr/bin/chromedriver-linux64/chromedriver"
    options = add_driver_options(driver_config["options"])
    options.binary_location = CHROME_BINARY_LOCATION
    driver = webdriver.Chrome(
        executable_path=CHROMEDRIVER_BINARY_LOCATION,
        options=options)
    return driver

### >>

## **Section 3** - Web Scraping

#### This section contains functions for scraping job details from job site and indian cities list from news articles

In [10]:
# Function to scrap job details and return data as a DataFrame
def get_source_data(job_role_lst: list, cnt: int) -> pd.DataFrame:
    fnl_lst = []
    key = 1

    # Iterating through job roles
    for job_role in job_role_lst:
        lst = job_role.split(' ')
        str_query = "+".join(lst)
        page = f'https://www.foundit.in/srp/results?query="{str_query}"'
        url = page.encode('ascii', 'ignore').decode('unicode_escape')

        driver = get_driver()
        driver.get(url)
        count = 0

        # Scraping job details
        while count <= cnt:
            x_pth = "/html\
                    /body\
                    /div[@id='srpThemeDefault']\
                    /div[@class='srpContainer']\
                    /div[@id='srpContent']\
                    /div[@class='srpCardContainer']\
                    /div[@class='srpResultCard']\
                    /div"
            elements = (driver.find_elements(By.XPATH, x_pth))

            for element in elements:
                try:
                    job_dict = {}
                    job_title = element.find_element(By.CLASS_NAME, "jobTitle").text
                    company_name = element.find_element(By.CLASS_NAME, "companyName").text
                    skills_str = ''

                    # Extracting skills
                    for i in element.find_elements(By.CLASS_NAME, "skillTitle"):
                        skill = i.text
                        if skill != '':
                            skills_str += skill + ','

                    sub_element = element.find_element(By.CLASS_NAME, "cardBody")
                    job_type_str = sub_element.find_element(By.XPATH, "div[1]/div[@class='details']").text
                    location_str = sub_element.find_element(By.XPATH, "div[2]/div[@class='details']").text
                    experience_str = sub_element.find_element(By.XPATH, "div[3]/div[@class='details']").text

                    # Storing job details in dictionary
                    job_dict['key'] = key
                    job_dict['job_role'] = job_role
                    job_dict['job_title'] = job_title
                    job_dict['company_name'] = company_name
                    job_dict['skills'] = skills_str[:-1]
                    job_dict['job_type'] = job_type_str
                    job_dict['location'] = location_str
                    job_dict['experience'] = experience_str
                    fnl_lst.append(job_dict)
                    count += 1
                    key += 1

                    if count == cnt:
                        break
                except:
                    pass

            try:
                element.find_element(By.CLASS_NAME, "mqfisrp-right-arrow").click()
            except:
                break

        driver.quit()

    # Converting list of dictionaries to DataFrame
    df = pd.DataFrame(fnl_lst)
    return df

In [11]:
# Function to get cities in India as a DataFrame through web scraping
def get_cities_indian() -> pd.DataFrame:
    cities = []
    page = 'https://www.britannica.com/topic/list-of-cities-and-towns-in-India-2033033'
    url = page.encode('ascii', 'ignore').decode('unicode_escape')
    driver = get_driver()
    driver.get(url)
    div_element = driver.find_element(By.CLASS_NAME, 'reading-channel')
    lists = div_element.find_elements(By.TAG_NAME, 'li')

    # Extracting city names
    for list in lists:
        txt = list.find_element(By.TAG_NAME, 'a').text.lower()
        cities.append(txt)

    df = pd.DataFrame(cities, columns=["city"])
    return df

### >>

## **Section 4** - GitHub Interaction

#### Functions related to interacting with GitHub, such as getting the repository, uploading data, and extracting data, are defined here.

In [12]:
# Function to get GitHub repository object
def get_repository() -> github.Repository.Repository:
    access_token = UserSecretsClient().get_secret('access_token')
    repo_str = UserSecretsClient().get_secret('repo_str')
    g = Github(access_token)
    repo = g.get_repo(repo_str)
    return repo

In [13]:
# Function to upload DataFrame as CSV file to GitHub
def upload_dataframe_to_github(df: pd.DataFrame, folder: str):
    branch = 'master'
    repo = get_repository()
    csv_content = df.to_csv(index=False)

    # Uploading CSV file to GitHub
    if folder in ['source', 'consumption']:
        IST = pytz.timezone('Asia/Kolkata')
        dt_tm = str(datetime.datetime.now(IST))[:19].replace(' ', '-')
        file_str = f'{dt_tm}.csv'
        repo.create_file(folder+'/'+file_str, 'upload '+folder+' level data', csv_content, branch=branch)
        print(folder+' layer file uploaded!')
    elif folder == 'reference':
        file_str = 'indian_cities.csv'
        try:
            repo.create_file(folder+'/'+file_str, 'create indian cities data', csv_content, branch=branch)
            print(folder+' layer file uploaded!')
        except:
            file = repo.get_contents(folder+'/'+file_str)
            repo.update_file(folder+'/'+file_str,'update indian cities data',csv_content,file.sha)
            print(folder+' layer file uploaded!')
    else:
        raise Exception("folder is either 'source' or 'consumption' or 'reference'!")

In [14]:
# Function to scrap Indian cities and update reference file in GitHub repo
def update_cities_data():
    # Get list of indian cities as dataframe
    df = get_cities_indian()
    # Upload dataframe as csv file in GitHub repo
    upload_dataframe_to_github(df=df, folder='reference')

In [15]:
# Iterate through the files and get the most recent one
def get_new_file(folder:str) -> str:
    if folder not in ['source', 'consumption', 'target', 'reference']:
        raise Exception("folder is either 'source' or 'consumption' or 'target' or 'reference'")

    repo = get_repository()
    contents = repo.get_contents(folder)
    path = ''

    # Finding the most recent file
    for c in contents:
        if path < c.path:
            path = c.path

    return path

In [16]:
# Function to get file path
def get_file_path(folder:str, path_type:str) -> str:
    github_url = 'https://github.com'
    repo_str = UserSecretsClient().get_secret('repo_str')
    branch = 'master'
    file_path = get_new_file(folder=folder)
    file_url = github_url + '/' + repo_str + '/blob/' + branch + '/' + file_path

    # Returning URL based on path type
    if path_type == 'url':
        return file_url
    elif path_type == 'raw':
        return file_url.replace('github', 'raw.githubusercontent').replace('blob/', '')
    else:
        raise Exception("path_type is either 'url' or 'raw'")

In [17]:
# Function to convert reference cities CSV file to list
def cities_lst() -> list:
    ref_raw_path = get_file_path(folder='reference', path_type='raw')
    df = pd.read_csv(ref_raw_path)
    lst = df.city.tolist()
    return lst

In [18]:
# Function to upload dictionary as JSON file to GitHub
def upload_dict_to_github(data_dict: dict, folder: str):
    if folder == 'target':
        branch = 'master'
        repo = get_repository()
        IST = pytz.timezone('Asia/Kolkata')
        dt_tm = str(datetime.datetime.now(IST))[:19].replace(' ', '-')
        file_str = f'{dt_tm}.json'
        content=json.dumps(data_dict, indent=4)
        repo.create_file(folder+'/'+file_str, 'upload target data', content, branch=branch)
        print(folder+' layer file uploaded!')
    else:
        raise Exception("folder should be 'target'!")

### >>

## **Section 5** - Email Notification

#### Function for sending email notifications about the status of the process are defined in this section.

In [19]:
# Function to send status via email
def sendMail(reciever_id: str, exception=None):
    # Configuration
    sender_id = UserSecretsClient().get_secret('sender_id')
    pass_word = UserSecretsClient().get_secret('pass_word')
    git_link = UserSecretsClient().get_secret('git_link')
    IST = pytz.timezone('Asia/Kolkata')
    time_now = datetime.datetime.now(IST)

    # Creating email message
    message = "STATUS:\n"
    mail = EmailMessage()
    mail['From'] = sender_id
    mail['To'] = reciever_id

    # Handling status based on exception
    if exception is None:
        mail['Subject'] = "Extraction done " + str(time_now.strftime("at %H:%M:%S, on %d/%m/%Y"))
        message += "Data extraction, cleanup, and loading done successfully " + str(time_now.strftime("at %H:%M:%S, on %d %B %Y (%A)")) + ".\n"
        message += "Files uploaded at " + git_link + ".\n"
    else:
        mail['Subject'] = "Exception occured " + str(time_now.strftime("at %H:%M:%S, on %d/%m/%Y"))
        message += "During the process of data extraction an exception occured.\n"
        message += "Exception: " + str(exception) + ".\n"
        message += "Visit https://www.kaggle.com/ to resolve the issue.\n"

    message += "\nSent from kaggle notebook."
    mail.set_content(message)

    # Sending email
    server = smtplib.SMTP("smtp.gmail.com", 587)
    server.starttls()
    server.login(sender_id, pass_word)
    server.send_message(mail)
    server.close()

### >>

## **Section 6** - Data Cleanup, Processing, and Transformation Functions

#### This section contains functions responsible for cleaning up raw data, processing it, and transforming it into a usable format.

In [20]:
# Function to extract cities from location string passed as parameter
def parse_location(location_str: str) -> str:
    location = [i.strip().lower() for i in location_str.replace('/', ',').split(",")]
    loc_lst = []

    for loc in location:
        if loc in cities:
            loc_lst.append(loc)

    lst = list(set(loc_lst)) # Remove duplicates
    result_string = ','.join(lst)
    return result_string

In [21]:
# Function to parse experience string passed and return required experience integer value
def parse_experience(experience_str: str) -> int:
    experience_str = experience_str.strip().lower()

    if '-' in experience_str:
        lst = [int(i) for i in experience_str.split(' ')[0].split('-')]
        res = sum(lst) // len(lst)
    else:
        if 'fresher' in experience_str:
            res = 0
        else:
            res = int(experience_str.split(' ')[0])

    return res

In [22]:
# Function to process skills and return skills list
def process_skills(skill_str: str) -> str:
    skill_lst = skill_str.lower().split(',')
    job_roles = [
        "data engineer",
        "data analyst",
        "data architect",
        "data scientist",
        "machine learning engineer"
    ]

    for i in job_roles:
        for j in skill_lst:
            if i in j:
                skill_lst.remove(j)

    result_string = ','.join(skill_lst)
    return result_string

In [23]:
# Function to parse job type string passed and return a list of expected values
def parse_job_type(job_type_str: str) -> str:
    job_type_str = job_type_str.lower()
    type_lst = []

    if 'full' in job_type_str:
        type_lst.append('full time')
    if 'home' in job_type_str:
        type_lst.append('work from home')
    if 'contract' in job_type_str:
        type_lst.append('contract job')
    if 'remote' in job_type_str:
        type_lst.append('remote job')
    if 'part' in job_type_str:
        type_lst.append('part time')

    result_string = ','.join(type_lst)
    return result_string

In [24]:
# Function that replaces NaN with values based on columns
def replace_nan(df: pd.DataFrame) -> pd.DataFrame:
    nan_values = {
        'job_role' : '',
        'job_title' : '',
        'company_name' : '',
        'experience' : 0,
        'location' : '',
        'skills' : '',
        'job_type' : ''
    }

    for col, val in nan_values.items():
        df[col] = df[col].fillna(val)

    return df

In [25]:
# Function to clean data
def clean_data(df: pd.DataFrame) -> pd.DataFrame:
    df = replace_nan(df)
    df['job_role'] = df['job_role'].str.lower()
    df['job_title'] = df['job_title'].str.lower()
    df['company_name'] = df['company_name'].str.lower()
    df['experience'] = df.apply(lambda x : parse_experience(x['experience']), axis=1)
    df['location'] = df.apply(lambda x : parse_location(x['location']), axis=1)
    df['skills'] = df.apply(lambda x : process_skills(x['skills']), axis=1)
    df['job_type'] = df.apply(lambda x : parse_job_type(x['job_type']), axis=1)
    return df

In [26]:
# Function to sort dictionary based on the values in descending order and return first n key-value pairs
def sort_dict_by_value(dictionary: dict, descending=True, n=None) -> dict:
    sorted_dict = dict(sorted(dictionary.items(), key=lambda item: item[1], reverse=descending))
    if n is not None:
        sorted_dict = dict(list(sorted_dict.items())[:n])
    return sorted_dict

In [27]:
# Function that aggregates consumption dataset and returns a dictionary
def roll_up_data(df_con: pd.DataFrame) -> dict:
    job_roles = [
        "data engineer",
        "data analyst",
        "data architect",
        "data scientist",
        "machine learning engineer"
    ]

    df_con = replace_nan(df_con)
    final_dict = {}

    for job_role in job_roles:
        final_dict[job_role] = {
            'company_dict' : {},
            'skill_dict' : {},
            'location_dict' : {},
            'job_type_dict' : {},
            'experience_dict' : {
                'entry level' : 0,
                'mid level': 0,
                'senior level': 0
            }
        }

    for index, row in df_con.iterrows():
        job_role = None

        for column, value in row.items():
            if column == 'job_role':
                job_role = value

            if column == 'company_name' and value != '':
                try:
                    final_dict.get(job_role).get('company_dict')[value] += 1
                except:
                    final_dict.get(job_role).get('company_dict')[value] = 1
            elif column == 'skills':
                for s in value.split(','):
                    if s != '':
                        try:
                            final_dict.get(job_role).get('skill_dict')[s] += 1
                        except:
                            final_dict.get(job_role).get('skill_dict')[s] = 1
            elif column == 'location':
                for l in value.split(','):
                    if l != '':
                        try:
                            final_dict.get(job_role).get('location_dict')[l] += 1
                        except:
                            final_dict.get(job_role).get('location_dict')[l] = 1
            elif column == 'job_type':
                for j in value.split(','):
                    if j != '':
                        try:
                            final_dict.get(job_role).get('job_type_dict')[j] += 1
                        except:
                            final_dict.get(job_role).get('job_type_dict')[j] = 1
            elif column == 'experience':
                if value < 3:
                    final_dict.get(job_role).get('experience_dict')['entry level'] += 1
                elif value >=3 and value <= 5:
                    final_dict.get(job_role).get('experience_dict')['mid level'] += 1
                else:
                    final_dict.get(job_role).get('experience_dict')['senior level'] += 1

    for k1, v1 in final_dict.items():
        for k2 in v1.keys():
            if k2 in ['company_dict', 'skill_dict', 'location_dict']:
                final_dict[k1][k2] = sort_dict_by_value(final_dict[k1][k2], descending=True, n=10)

    return final_dict

### >>

## **Section 7** - Orchestration

#### This section contains functions responsible for orchestrating the entire process.

In [28]:
# Scraping and loading source data
def scrap_and_load():
    # List of job roles
    job_roles = [
        "Data Engineer",
        "Data Analyst",
        "Data Architect",
        "Data Scientist",
        "Machine Learning Engineer"
    ]
    # Get source data (job postings)
    df = get_source_data(job_role_lst=job_roles, cnt=500)
    print('web scraping done!')
    # Upload data to GitHub
    upload_dataframe_to_github(df=df, folder='source')

In [29]:
# Function that cleans source dataset and uploads consumption file
def upload_consumption_file():
    # Read recent source file into DataFrame
    raw_path = get_file_path(folder='source', path_type='raw')
    df_raw = pd.read_csv(raw_path)
    # Clean DataFrame
    df_con = clean_data(df_raw)
    # Upload file
    upload_dataframe_to_github(df_con, folder='consumption')

In [30]:
# Function that aggregates consumption dataset and uploads target file
def upload_target_file():
    # Read recent consumption file into DataFrame
    con_path = get_file_path(folder='consumption', path_type='raw')
    df_con = pd.read_csv(con_path)
    # Aggregate data
    fnl_dict = roll_up_data(df_con)
    # Upload file
    upload_dict_to_github(fnl_dict, folder='target')

### >>

## **Section 8** - Execution

#### This section calls the main function to execute the entire process.

In [31]:
# Entry point of the script
if __name__ == '__main__':
    # My email id
    reciever_id = UserSecretsClient().get_secret('reciever_id')
    # Try-catch block to handle exceptions and send mail
    try:
        # Upload cities data in GitHub
        update_cities_data()
        # Indian cities list
        cities = cities_lst()
        # Scrap and load source data
        scrap_and_load()
        # Upload consumption file
        upload_consumption_file()
        # Upload target file
        upload_target_file()   
        # Send mail if process succeeded
        sendMail(reciever_id)
    except Exception as e:
        print('exception occured!')
        print(e)
        # Send mail if exception occurred
        sendMail(reciever_id, exception=e)

reference layer file uploaded!
web scraping done!
source layer file uploaded!
consumption layer file uploaded!
target layer file uploaded!


### -------------------------