### Este ejercicio es muy parecido al primero. Lo unico es que hay que modificarlo un poco para añadir otro diccionario como un valor al diccionario que se representa cada tweet. Con esto, se puede acceder a ese diccionario para imprimir por pantalla las palabras de cada tweet que no aparecen en el fichero 'Sentimientos.txt'.

### Para este ejercicio he añadido dos funciones. Una para crear el diccionario de palabras de cada tweet que no aparecen en 'Sentimientos.txt' y la media del valor del tweet, y actualizar el diccionario del tweet. La otra para imprimir por pantalla esas palabras y el valor asignado, también he formateado las palabras para mejorar la legibilidad para el usuario.

### Al iniciar la clase TweetAnalysis, se toma dos argumentos, el path al fichero 'Sentimientos.txt', y el path al fichero 'Tweets.txt'. Al momento de inicializar, se inicia 3 funciones, set_sentiments, parse_tweets, y get_nonword_scores. Ahora el usuario puede decidir a imprimir las palabras de cada tweet y la media del valor del tweet usando la función print_nonwords al objeto clase.

### En este ejercicio, aprendí que se puede tener dos bucles for en otro bucle for. Esto será util en el futuro

In [89]:
class TweetAnalysis:
    """
    The SentimentAnalysis object contains sentiments and tweets
    
    Args:
        sentiments_txt: .txt file containing words and their associated sentiment score
        twitter_data: .txt file contain JSON data of tweet records
        
    Attributes:
        sentiments: List of words and their sentiment score
        tweets: List of JSON dictionaries for tweet records
    """

    def __init__(self, sentiments_txt, twitter_data):
        self.set_sentiments(sentiments_txt)
        self.parse_tweets(twitter_data)
        self.get_nonword_scores(self.set_sentiments(sentiments_txt),
                          self.parse_tweets(twitter_data))
        
    def set_sentiments(self, sentiments_txt):
        """
        Generates dictionary of words and their sentiment score
        
        Args:
            sentiments_txt: .txt file containing words and their associated sentiment score
            
        Attributes:
            sentiments: List of words and their sentiment score
        """
        with open(sentiments_txt, 'r') as sentimientos: 
            self.sentiments = {}  # Create empty dictionary
            for linea in sentimientos:
                sentiment, value = linea.split("\t")  # Split word from sentiment score
                self.sentiments[sentiment] = int(value)  # Assign word and sentiment score to sentiments dictionary
    
    def parse_tweets(self, twitter_data):
        """
        Generates list of dictionaries from JSON data of tweet records
        
        Args:
            twitter_data: .txt file contain JSON data of tweet records
        
        Attributes:
            tweets: List of JSON dictionaries for tweet records
        """
        import json
        self.tweets = []  # Create empty list
        with open(twitter_data) as file:
            content = file.read().split('\n')  # Generates list of records
            for line in content:  # Iterate over length of list of records
                try:
                    tweet = json.loads(line)  # Convert each record to JSON format
                    self.tweets.append(tweet)  # Append record to tweets list
                except:
                    continue  # If above fails, continues to next record
                    
    def get_nonword_scores(self, tweets, sentiments):
        """
        Scores each tweet using sentiments dictionary, then scores each word of a tweet with the average tweet score for that tweet,
        adding the words that do not appear in the list of sentiments and average tweet score to the tweet record generated by parse_tweets()
        
        Args:
            tweets: List of tweet records generated by parse_tweets
            sentiments: Dictionary of words and sentiment scores generated by set_sentiments
        """
        for tweet in self.tweets:  # Iterate over each tweet
            tweet_score = 0  # Set score for tweet to zero
            word_count = 0  # Set word count in tweet to zero
            if 'text' in tweet.keys():
                non_words = {}  # Create empty dictionary to recieve words absetn from sentiments and their score
                for word in tweet['text'].split():  # Iterate over words in a tweet
                    word_count += 1  # Count words in tweet
                    if word.lower() in self.sentiments:  # Returns boolean. True if word is in self.sentiments, False if not.
                        tweet_score += self.sentiments.get(word.lower())  # If True is returned, score associated with given word is added to score
                tweet['score'] = tweet_score  # Create score key for tweet with corresponding sentiment score
                for word in tweet['text'].split():  # Iterate again over words in a tweet
                    if word.lower() not in self.sentiments:  # Return True if word is not in sentiments dictionary
                        try:  # Try-except necessary as some tweets have a score of 0, this avoids dividing by zero
                            non_words[word] = word_count / tweet_score  # non_words dictionary is updated with the word and the average tweet_score
                        except:  # This code runs when a tweets score is zero
                            non_words[word] = tweet_score  # non_words dictionary updated with the word and the tweet_score, in this case, 0
                    tweet['non_words'] = non_words  # non_words dictionary is added to the tweet record with the key 'non_words'
                    
    def print_nonwords(self):
        """
        Prints words absent from sentiments list and the average tweet score

        Args:
            tweets: List of tweet records generated by parse_tweets and updated by get_nonword_scores
        """
        for tweet in self.tweets:  # Iterate over each tweet
            if 'non_words' in tweet:  # code runs if tweet record was updated by get_nonword_scores
                print('  Siguente tweet  ')  # Formatting for easier readability by user upon use of print_nonwords function
                print('------------------')
                for word in tweet['non_words']:  # Print each word and the average tweet score for each tweet
                    print("'" + word + "'" + ' : ' + str(tweet['non_words'].get(word)))
                print('------------------')

Iniciar la clase SentimentAnalysis pasando como argumentos el path a los ficheros "Sentimientos.txt" y "Tweets.txt"

In [90]:
TA = TweetAnalysis('Sentimientos.txt', 'Tweets.txt')

Imprimir por pantalla para cada tweet, una lista de las palabras que no aperician en el fichero 'Sentimmientos.txt' seguido por la media del valor del tweet

In [93]:
TA.print_nonwords()

  Siguente tweet  
------------------
'@Brenamae_' : 0
'I' : 0
'WHALE' : 0
'SLAP' : 0
'YOUR' : 0
'FIN' : 0
'AND' : 0
'TELL' : 0
'YOU' : 0
'ONE' : 0
'LAST' : 0
'TIME:' : 0
'GO' : 0
'AWHALE' : 0
------------------
  Siguente tweet  
------------------
'Metin' : 0
'Şentürk' : 0
'Twitterda' : 0
'@metinsenturk' : 0
'MUHTEŞEM' : 0
'ÜÇLÜ;' : 0
'SEN,' : 0
'BEN,' : 0
'MÜZİK' : 0
------------------
  Siguente tweet  
------------------
'RT' : 2.6666666666666665
'@byunghns:' : 2.6666666666666665
'😭' : 2.6666666666666665
'I' : 2.6666666666666665
'#틴탑' : 2.6666666666666665
'SO' : 2.6666666666666665
'MUCH' : 2.6666666666666665
'#쉽지않아' : 2.6666666666666665
'IS' : 2.6666666666666665
'GOING' : 2.6666666666666665
'TO' : 2.6666666666666665
'BE' : 2.6666666666666665
------------------
  Siguente tweet  
------------------
'que' : 0
'hdp' : 0
'maicon' : 0
'lo' : 0
'le' : 0
'hizo' : 0
'a' : 0
'david' : 0
'luiz' : 0
'jajajajajajajajajajaj,igual' : 0
'se' : 0
'jodio' : 0
'la' : 0
'carrera' : 0
---------------

'بالسينما' : 0
'يقبض' : 0
'الممثل' : 0
'ملايين' : 0
'كذلك' : 0
'المخرج' : 0
'والممتج' : 0
'المستفيد' : 0
'الاكبر,' : 0
'أما' : 0
'بفليم' : 0
'داعش' : 0
'السيناريست' : 0
'والمنتج' : 0
'خسروا' : 0
'لان' : 0
'الم…' : 0
------------------
  Siguente tweet  
------------------
'@SoccerSeki824' : 0
'まあ、そこら辺はまた（笑）' : 0
------------------
  Siguente tweet  
------------------
'ニルヴァーシュが２時をお伝え！' : 0
------------------
  Siguente tweet  
------------------
'@anooood1401' : 0
'سلام' : 0
------------------
  Siguente tweet  
------------------
'JUST' : 0
'AHHHHHH😍😍😍😍🙌😱😍😍😍' : 0
'http://t.co/eAdv1R3u4j' : 0
------------------
  Siguente tweet  
------------------
'@yuuna623815' : 0
'え、なにしたん' : 0
------------------
  Siguente tweet  
------------------
'RT' : 0
'@CHlLDHOODRUINER:' : 0
'Jay' : 0
'Z' : 0
'44' : 0
'and' : 0
'Beyoncé' : 0
'32' : 0
'so' : 0
'I'm' : 0
'not' : 0
'gonna' : 0
'stress' : 0
'it' : 0
'cause' : 0
'bae' : 0
'starts' : 0
'1st' : 0
'grade' : 0
'tomorrow' : 0
------------------
  Sigu

------------------
'RT' : 1.6666666666666667
'@Nashgrier:' : 1.6666666666666667
'morning' : 1.6666666666666667
'😁' : 1.6666666666666667
------------------
  Siguente tweet  
------------------
'@SoyNovioDeTodas' : 0
'¡Enhorabuena' : 0
'por' : 0
'tu' : 0
'250' : 0
'★' : 0
'tuit!' : 0
'http://t.co/RZBZIJ1RpJ' : 0
------------------
  Siguente tweet  
------------------
'"Acaba' : 0
'bu' : 0
'tweette' : 0
'bahsettiği' : 0
'kişi' : 0
'ben' : 0
'miyim?"diye' : 0
'bir' : 0
'şey' : 0
'var.' : 0
'Metin' : 0
'Şentürk' : 0
'Twitterda' : 0
'@metinsenturk' : 0
------------------
  Siguente tweet  
------------------
'@attitydisk' : 0
'mansdominerade' : 0
'e' : 0
'oljekonferenser' : 0
'båda?' : 0
------------------
  Siguente tweet  
------------------
'RT' : 0
'@1DUpdateBRA:' : 0
'Até' : 0
'o' : 0
'dia' : 0
'16' : 0
'de' : 0
'setembro,' : 0
'NÃO' : 0
'é' : 0
'preciso' : 0
'votar!' : 0
'Por' : 0
'enquanto' : 0
'a' : 0
'votação' : 0
'apenas' : 0
'para' : 0
'os' : 0
'que' : 0
'ainda' : 0
'não' : 0
'f