### Re-Tweet Statistics for Twitter dataset::

<span style="font-family:Papyrus; font-size:1.25em;">

This function computes statistics related to re-Tweets.  Refer to the code comments in the cell below for specifics.<br>

</span>

In [None]:
def retweet_statistics(tweet_dataframe):
    """
    Re-Tweet statistics and visualizations for the CSV format Twitter dataset.
    Uses the Pandas "value_counts()" function to display unique values for the attribute.
    Groups user-specified attribute by the company the Tweet is associated with.

    Note: The raw JSON file does not have associated "company" information.

    :param tweet_dataframe: the Twitter dataset in a Pandas dataframe.
    :return: None.
    """

    # Select only rows with one associated company. (don't graph company combos)
    single_company_only_df = tweet_dataframe.loc[tweet_dataframe['multiple_companies_derived_count'] == 1]

    print(f"ReTweeted Attribute Statistics for entire CSV dataset:")
    print(tweet_dataframe["retweeted_derived"].value_counts())
    print()

    print(f"ReTweeted Statistics for CSV dataset by Company:")
    print(f"Number of Tweets that are \"retweeted_derived\" Attribute by associated company: ")
    print(tweet_dataframe.groupby(['company_derived_designation', "retweeted_derived"]).size())
    print()

    # Graph the Statistics.
    print(f"Percentage of All Tweets that Are or Aren't Retweets' by Associated Company: ")
    plt.figure()
    grid = sns.FacetGrid(tweet_dataframe[["retweeted_derived", 'company_derived_designation']],
                         col='company_derived_designation',
                         col_wrap=6, ylim=(0, 1))
    grid.map_dataframe(tweet_util.bar_plot, "retweeted_derived").set_titles('{col_name}') \
        .set_xlabels("ReTweet - 0.0 No, 1.0 Yes").set_ylabels("Percentage of All Tweets")
    plt.show()

    plt.figure()
    print(f"Percentage of All Tweets Associated with a Given Company by ReTweet Count")
    grid = sns.FacetGrid(tweet_dataframe[['tweet_id', 'company_derived_designation']],
                         col='company_derived_designation', col_wrap=6,
                         ylim=(0, 1),
                         xlim=(0, 10))
    grid.map_dataframe(tweet_util.bar_plot_zipf, 'tweet_id').set_titles('{col_name}').set_xlabels(
        'ReTweeted Count').set_ylabels("Percentage of All Tweets")
    plt.show()

    # Retweet counts of the top Retweeted Tweets.
    print(f"\nReTweet counts for the Top (most) Retweeted Tweets.\n")
    print(tweet_dataframe[['company_derived_designation', 'tweet_id']].groupby('company_derived_designation') \
          .apply(lambda x: x['tweet_id'].value_counts().value_counts(normalize=True) \
                 .sort_index(ascending=False).head(3)))

    # Portion of top Retweeted Tweets.
    print(f"\nWhat Percentage of All Tweets for Given Company does the Top (most) Retweeted Tweets Comprise?.\n")
    print(tweet_dataframe[['company_derived_designation', 'tweet_id']].groupby('company_derived_designation') \
          .apply(lambda x: x['tweet_id'].value_counts(normalize=True).head()))

<span style="font-family:Papyrus; font-size:1.25em;">
    
We call out data analysis function and pass in the CSV dataset imported at the beginning of the sections detailing our data analysis.<br>
    
</span>

In [None]:
    # Determine whether Tweets have been re-Tweeted.
    retweet_statistics(tweet_csv_dataframe)

<span style="font-family:Papyrus; font-size:1.25em;">

The first set of graphs show the proportion of Tweets that are or are not re-tweets by the company the Tweets are associated with.<br>

The second set of graphs show how many times a proportion of Tweets have been re-Tweeted that are associated with a given company.  For example, a bar of 0.4 on the y-axis and 4 on the x-axis would indicate that 40% of all the Tweets associated with that company have been re-Tweeted 4 times.<br>

</span>

<span style="font-family:Papyrus; font-size:1.25em;">

**TODO: finish analyzing and answering questions posted in Trello board tasks.**<br>

Note: Retweet text ARE included in the "retweeted_status" meta object.<br>

</span>

In [None]:
<span style="font-family:Papyrus; font-size:1.25em;">

Placeholder.

</span>

In [None]:
<span style="font-family:Papyrus; font-size:1.25em;">

Placeholder.

</span>

In [None]:
<span style="font-family:Papyrus; font-size:1.25em;">

Placeholder.

</span>

In [None]:
<span style="font-family:Papyrus; font-size:1.25em;">

Placeholder.

</span>

In [None]:
<span style="font-family:Papyrus; font-size:1.25em;">

Placeholder.

</span>

## Resources Used:

<span style="font-family:Papyrus; font-size:1.25em;">

**TODO: convert to annotated bibliography**

Dataset Files (obtained from Borg supercomputer):<br>

dataset_slo_20100101-20180510.json<br>
dataset_20100101-20180510.csv<br>

Note: These are large fiels not included in the project GitHub Repository.<br>


- [SLO-analysis.ipynb](SLO-analysis.ipynb)<br>
    -original SLO Twitter data analysis file from Shuntaro Yada.<br>


- https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/intro-to-tweet-json<br>
    -explanation of all data fields in JSON file format for Tweets.<br>


- https://datatofish.com/export-dataframe-to-csv/<br>
- https://datatofish.com/export-pandas-dataframe-json/<br>
    -saving Pandas dataframe to CSV/JSON<br>
    

- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html<br>
    -Pandas to_datetime() function call.<br>
    

- https://www.machinelearningplus.com/plots/matplotlib-tutorial-complete-guide-python-plot-examples/<br>
    -plotting with matplotlib.<br>


</span>

## TODO's:

<span style="font-family:Papyrus; font-size:1.25em;">

Implement further elements from Shuntaro Yada's SLO Twitter Dataset Analysis.<br>

</span>