# Tweets producer

The Tweets data is produced directly from the [sampled data](./fresh_data/) obtained from Tweeter. The code below gets a sample of values from the data sets and produces it into the main topic.

The first function below is a slightly modified version of the first sample_data function described in the [live-sentiment-analysis notebook](./live-sentinment-analysis.ipynb).

In [17]:
from os import walk
from os.path import join
import re
import glob
import pandas as pd
import time
import sentiment_analysis as sa

In [18]:
def sample_data(airline, filename, sample_size=10):
    sheet_df = pd.read_csv(filename)
    sample_df = sheet_df.sample(sample_size)

    for i in range(sample_df.iloc[:, 1:].shape[0]):
        row_json = sheet_df.iloc[i, 1:].to_dict()
        row_json['airline'] = airline
        yield row_json

In [19]:
producer = sa.BaseProducer('./config.properties', 'twitter-data-test')

%4|1658964393.649|CONFWARN|rdkafka#producer-1| [thrd:app]: Configuration property session.timeout.ms is a consumer property and will be ignored by this producer instance
%4|1658964393.659|CONFWARN|rdkafka#producer-2| [thrd:app]: Configuration property session.timeout.ms is a consumer property and will be ignored by this producer instance


In [20]:
data_path = './fresh_data/'
for (_, _, filenames) in walk(data_path):
    for f in filenames:
        matched = re.match(r'([a-z]*)_', f)

        if not matched:
            print(f"{f} is not a valid file skipping")
            continue
        
        airline = matched.groups()[0]
        data_file = join(data_path, f)
        
        gen = sample_data(airline, data_file, sample_size=3)

        producer.produce(gen)

Producing record: 1545206769236025345: {"created_at": "2022-07-08T00:42:56.000Z", "id": 1545206769236025345, "text": "@Matt_Starbucks @AlaskaAir This man has receipts \ud83d\ude02", "airline": "alaskaair"}
Producing record: 1545204008205225984: {"created_at": "2022-07-08T00:31:58.000Z", "id": 1545204008205225984, "text": "@missbeaux @AlaskaAir Congrats, Natalie! I am so excited for you, what you will bring to this role and how the guest will benefit from the products the team will launch! \u2764\ufe0f \u2764\ufe0f\u2764\ufe0f", "airline": "alaskaair"}
Producing record: 1545203074901233665: {"created_at": "2022-07-08T00:28:16.000Z", "id": 1545203074901233665, "text": "I love you @AlaskaAir", "airline": "alaskaair"}
Produced record to topic twitter-data-test partition [0] @ offset 20
Produced record to topic twitter-data-test partition [0] @ offset 21
Produced record to topic twitter-data-test partition [0] @ offset 22
3 messages were produced to topic twitter-data-test!
Producing record

In [21]:
data_path = './fresh_data/'

while True:
    for (_, _, filenames) in walk(data_path):
        for f in filenames:
            matched = re.match(r'([a-z]*)_', f)

            if not matched:
                print(f"{f} is not a valid file skipping")
                continue
            
            airline = matched.groups()[0]
            data_file = join(data_path, f)
            
            gen = sample_data(airline, data_file, sample_size=5)

            producer.produce(gen)

    time.sleep(60)



Producing record: 1545206769236025345: {"created_at": "2022-07-08T00:42:56.000Z", "id": 1545206769236025345, "text": "@Matt_Starbucks @AlaskaAir This man has receipts \ud83d\ude02", "airline": "alaskaair"}
Producing record: 1545204008205225984: {"created_at": "2022-07-08T00:31:58.000Z", "id": 1545204008205225984, "text": "@missbeaux @AlaskaAir Congrats, Natalie! I am so excited for you, what you will bring to this role and how the guest will benefit from the products the team will launch! \u2764\ufe0f \u2764\ufe0f\u2764\ufe0f", "airline": "alaskaair"}
Producing record: 1545203074901233665: {"created_at": "2022-07-08T00:28:16.000Z", "id": 1545203074901233665, "text": "I love you @AlaskaAir", "airline": "alaskaair"}
Producing record: 1545202030360178688: {"created_at": "2022-07-08T00:24:07.000Z", "id": 1545202030360178688, "text": "@missbeaux @AlaskaAir Was wondering! Congratulations on the new role! \ud83c\udf89", "airline": "alaskaair"}
Producing record: 1545201862453985282: {"created_

KeyboardInterrupt: 

## Sample data preliminary results

Once we've got the sample data being produced and sending messages into Elasticsearch by running three notebooks at the same time, we can check the results in the dashboard created so far.

![All Engines running](./imgs/all-running.png "Both Consumers and producers running")

The above screenshot shows the consumers running while the Tweets from the sample data is running

![Sentiments Dashboard](./imgs/sentiments-dashboard.png "Sentiment Analysis Dashboard in Kibana")

The above screenshot shows the Kibana dashboard with the data being recorded and grouped.

## Live Sentiment analysis

The last step is to run the retrieve new tweets, this time directly from Twitter's and send it into the producer as it was done above.

In [1]:
from searchtweets import ResultStream, gen_request_parameters, load_credentials

In [2]:
search_args = load_credentials("./.twitter_keys.yaml", yaml_key="search_tweets_v2",env_overwrite=False)

In [3]:
query = gen_request_parameters("@United", results_per_call=10, granularity=None, tweet_fields='created_at')
print(query)

{"query": "@United", "max_results": 10, "tweet.fields": "created_at"}


In [4]:
rs = ResultStream(request_parameters=query, max_results=10, max_pages=1, **search_args)

In [5]:
stream = rs.stream()

Given that the request is a generator, then we can just call the first next and get fresh tweets from the above mentioned resultstream.

In [7]:
next(stream)

{'data': [{'created_at': '2022-07-28T00:08:10.000Z',
   'id': '1552445776558981120',
   'text': 'A huge shout out and thank you to the @united gate agents at gate 8 (I believe one was named Daniel) this morning at @JohnWayneAir for working their asses off to get people home today! Good service should be recognized when people have to work under pressure.'},
  {'created_at': '2022-07-28T00:07:56.000Z',
   'id': '1552445715158568960',
   'text': '@TIA_EWING @united Some of them businessmen are just NASTY!'},
  {'created_at': '2022-07-28T00:07:51.000Z',
   'id': '1552445697513095171',
   'text': 'Flew home on @united’s direct route from Nice to Newark in Polaris! The 767 aircraft has 46 business class seats in a 1-1-1 configuration and it’s very comfortable! ✈️ https://t.co/RsghAghaqw'},
  {'created_at': '2022-07-28T00:07:11.000Z',
   'id': '1552445527815757827',
   'text': '@edccruz @weareunited @united hello how are you doing'},
  {'created_at': '2022-07-28T00:06:15.000Z',
   'id': '155

### Live producer

Below is the definition for the live tweet producer, which is somewhat based on the sampler that was seen at the beginning of the notebook, but getting data directly from Twitter's API.

In [15]:
import sentiment_analysis as sa
import functools
import time

search_args = load_credentials("./.twitter_keys.yaml", yaml_key="search_tweets_v2",env_overwrite=False)

producer = sa.BaseProducer('./config.properties', 'twitter-data-test')

handlers = ["United", "AmericanAir", "SouthwestAir", "Delta", "alaskaair", "AirCanada", "WestJet", "airtransat", "porterairlines", "FlairAirlines"]

while True:
    for h in handlers:

        airline = h
        
        tws_stream = sa.search_tweets(f"@{h}", search_args, results_per_call=10, max_results=10)

        tr = next(tws_stream)

        def tag(tweet):
            tweet['airline'] = airline
            return tweet

        tagged = map(tag, tr['data'])

        producer.produce(tagged)

    time.sleep(180)



%4|1658980230.363|CONFWARN|rdkafka#producer-15| [thrd:app]: Configuration property session.timeout.ms is a consumer property and will be ignored by this producer instance
%4|1658980230.369|CONFWARN|rdkafka#producer-16| [thrd:app]: Configuration property session.timeout.ms is a consumer property and will be ignored by this producer instance


Producing record: 1552501376839815168: {"created_at": "2022-07-28T03:49:06.000Z", "id": "1552501376839815168", "text": "@JoLuehmann @united United is absolutely atrocious! I don't have children, but as an Aunt, I do understand that children can get uncomfortable when their schedule &amp; routine are shaken up. United knows this as well, and should be very accommodating of this! You did a fantastic job and I applaud you!", "airline": "United"}
Producing record: 1552501183629115394: {"created_at": "2022-07-28T03:48:20.000Z", "id": "1552501183629115394", "text": "@JoLuehmann @united Hola @MelinnaTeatrina! Figured you might be interested in reading this story after the horrible experience you just had.", "airline": "United"}
Producing record: 1552500060641890305: {"created_at": "2022-07-28T03:43:53.000Z", "id": "1552500060641890305", "text": "RT @warriors: No. 4\u20e3 took the \ud83c\udfc6 back to his hometown and made a friend along the way.\n\n@united || Champions Tour https://t.co/IjL4S

KeyboardInterrupt: 