# CA2 - ML on Agriculture in Ireland and EU

## Overview

Sentiment analysis on data x

2 ML models on data y

...

## Auxiliary Functions

In this section, the auxiliary functions used in this notebook were implemented.

* Instructions to use the Twitter API:

- There should exist a file called .twitter_env in the machine's home directory with the following keys:

    ```
    API_KEY=***
    API_KEY_SECRET=***
    BEARER_TOKEN=***
    ```

In [54]:
#!pip install python-dotenv
from dotenv import dotenv_values
from pathlib import Path
from os import listdir
import os
import logging
import warnings
import requests
import json

# ignore warnings
warnings.filterwarnings('ignore')

# use a logger to help debugging
logger = logging.getLogger('mylogger')

# set logger level
logger.setLevel(logging.ERROR)

# path to the current directory
CURR_PATH = os.path.abspath(os.getcwd())

# path to the users directory
HOME_DIR = str(Path.home())

# path to dataset dir
DATASET_DIR = os.path.join(CURR_PATH, 'datasets')

# twitter env file
TWITTER_ENV_FILE = '.twitter_env'

# twitter recent search api url
TWITTER_API_SEARCH_URL = 'https://api.twitter.com/2/tweets/search/recent'

def getEnvObj():
    env_path = os.path.join(HOME_DIR, TWITTER_ENV_FILE)

    if not os.path.exists(env_path):
        logger.error(F'Unable to read the environment file. Make sure a { TWITTER_ENV_FILE } file exists in your home directory.')
        return None

    return dotenv_values(env_path)

def createTwitterConfigFile():
    config = getEnvObj()

    if config is None:
        logger.error("Unable to set Twitter's config file.")
        return False

    twitter_keys = f'''keys:
    access_token: {config["API_KEY"]}
    access_token_secret: {config["API_KEY_SECRET"]}
    bearer_token: {config["BEARER_TOKEN"]}
    '''
    keys_path = os.path.join(HOME_DIR, '.twitter-keys.yaml') 
    with open(keys_path, 'w+') as file:
        file.write(twitter_keys)
    logger.info(f"Twitter keys file '{ keys_path }' updated!")

    return True

def bearerAuth(r):
    """
    Method required by bearer token authentication.
    """
    config = getEnvObj()
    
    if config is None:
        raise Exception('Unable to create Bearer authorization object.')

    r.headers['Authorization'] = f"Bearer { config['BEARER_TOKEN'] }"
    r.headers['User-Agent'] = 'v2RecentSearchPython'

    return r

def connectToEndpoint(url, params_dict={}):
    response = requests.get(url, auth=bearerAuth, params=params_dict)

    if response is None:
        raise Exception('Invalid response.')

    logger.info(response.status_code)

    if response.status_code != 200:
        raise Exception(response.status_code, response.text)

    return response.json()

def getRecentTweets(params={}, outfile=''):
    # querying the API
    json_response = connectToEndpoint(TWITTER_API_SEARCH_URL, params)
  
    # save json to file if not empty
    saveJsonToFile(json_response, outfile)
    
    print(f"{ len(json_response['data']) } tweets retrieved!")
    
    return json_response

def saveJsonToFile(json_data, outfile):
    if len(outfile) == 0:
        logger.warning('Output filename is empty!')
        return

    with open(outfile, 'w', encoding='utf-8') as f:
        json.dump(json_data, f, ensure_ascii=False, indent=4)

def convertJsonToString(json_obj):
    return json.dumps(json_obj, indent=4, sort_keys=True, ensure_ascii=False)

## Sentiment Analysis

In this section, the Twitter API was used to collect tweets from the last few days that will be used for the sentiment analysis.

The purpose of this search was to find tweets about inflation or food price, which relate to the agriculture topic.

However, there is a limitation in the quality of data being collected as the query API feature performs a search by token which can result in tweets about any topic.

### Data Collection - Twitter API

In [55]:
query_params = {
    'query' : '(inflation OR "food price" OR "agriculture") Europe -is:retweet -has:media lang:en',
    'tweet.fields': 'author_id', 
    'user.fields': 'name',
    "max_results":"100",
}

data = getRecentTweets(query_params, os.path.join(DATASET_DIR, 'twitter_data.json'))

100 tweets retrieved!


In [60]:
import pandas as pd

tweets = [d['text'] for d in data['data']]

df = pd.DataFrame(tweets, columns=['tweets'])

In [57]:
df.count()

Tweets    100
dtype: int64

In [None]:
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer()

data = vectorizer.fit_transform(df['tweets'])

# Display the feature names in sorted order
print(vectorizer.get_feature_names())

In [64]:
from sklearn.decomposition import LatentDirichletAllocation

# Declare and initialise a variable t
t = 20

# Declare and initialise an object 'lda' by calling a method LatentDirichletAllocation()
lda = LatentDirichletAllocation(n_components = t, learning_method = 'batch', random_state = 42)

# Train the model
lda.fit(data)

# Print all lda components
print(lda.components_)

# Get all feature names
terms = vectorizer.get_feature_names()

[[ 0.05        0.05        0.05       ...  0.05        0.05
   0.05      ]
 [ 0.05        0.05        0.05       ...  0.05        1.05
   0.05      ]
 [ 0.05        0.05        3.52703678 ...  0.05        0.05
   0.05      ]
 ...
 [ 0.05        0.05        0.05       ...  1.05        0.05
   0.05      ]
 [ 0.05        1.05       11.31827306 ...  0.05        0.05
   0.05      ]
 [ 0.05        0.05        0.05       ...  0.05        0.05
   0.05      ]]


In [65]:
for topic_idx, topic in enumerate(lda.components_):
    print("Topic {}:" .format(topic_idx))
    print(" ".join([terms[i] for i in topic.argsort()[-10:]]))

Topic 0:
canceled r8h8kghyi1 chinese https co get now recession we so
Topic 1:
than in you and is to inflation europe of the
Topic 2:
why usa of inflation and europe 10 in to the
Topic 3:
high energy but europe inflation we to as in the
Topic 4:
and europe be not should has inflation this is the
Topic 5:
with that of inflation is has in europe and the
Topic 6:
an as can have it here at the but to
Topic 7:
for this war in to europe inflation co https the
Topic 8:
china to of euros 12 inflation spain billion 2022 10
Topic 9:
people it in this maybe should first and we to
Topic 10:
for ukraine agriculture to crisis is that of and the
Topic 11:
about that not how war europe of inflation in and
Topic 12:
https and of as but inflation it europe in the
Topic 13:
least have the well inflation be europe will to in
Topic 14:
https co and that europe inflation to the is in
Topic 15:
as be russia not are will for europe in inflation
Topic 16:
costs amp is europe worse be will higher inflation the
