### Time Series Statistics for Twitter dataset::

<span style="font-family:Papyrus; font-size:1.25em;">

This function analyzes when a Tweet was created and accrues statistics on the # of Tweets created in the same time period across the entire dataset and by company .<br>

</span>

In [None]:
def tweet_count_by_timedate_time_series(created_at_attribute_file, file_type):
    """
    Function computes time series statistics and visualizations on Tweet Creation Time-Date Stamp.

    This function will work for any JSON file or CSV file that contains a attribute or column named "tweet_created_at"
    and "company_derived".

    Note: Ensure input file is small enough to fit in RAM.  This function will not read in data by chunks!

    :param file_type: type of input file.
    :param created_at_attribute_file: the input file containing the "created_at" Tweet attribute.
    :return: None.
    """
    start_time = time.time()

    if file_type == "csv":
        twitter_data = pd.read_csv(f"{created_at_attribute_file}", sep=",")
    elif file_type == "json":
        twitter_data = pd.read_json(f"{created_at_attribute_file}",
                                    orient='records',
                                    lines=True)
    else:
        print(f"Invalid file type entered - aborting operation")
        return

    # Create a empty Pandas dataframe.
    dataframe = pd.DataFrame(twitter_data)

    # Select only rows with one associated company. (don't graph company combos)
    single_company_only_df = dataframe.loc[dataframe['multiple_companies_derived_count'] == 1]

    # 1st Plot.
    plt.figure(figsize=(18.5, 10.5), dpi=300)
    plt.title(f"Tweet Creation Time-Date Count by Year/Month/Day")
    plt.xlabel("Year/Month/Day")
    plt.ylabel("Tweet Count")
    pd.to_datetime(dataframe['tweet_created_at']).value_counts().resample('1D').sum().plot()
    plt.show()

    # 2nd Plot.
    plt.figure()
    grid = sns.FacetGrid(dataframe[['tweet_created_at', 'company_derived_designation']],
                         row='company_derived_designation', size=2,
                         aspect=10,
                         sharey=False)
    grid.map_dataframe(tweet_util.ts_plot, 'tweet_created_at').set_titles('{row_name}')
    plt.show()

    # # 3rd Plot.
    # # FIXME - not working as intended.
    # plt.figure()
    # grid = sns.FacetGrid(dataframe[['retweeted_derived', 'tweet_created_at', 'company_derived']], row='company_derived',
    #                      size=2, aspect=10,
    #                      sharey=False)
    # grid.map_dataframe(tweet_util.ts_plot_2, 'tweet_created_at').set_titles('{row_name}')
    # plt.show()

    end_time = time.time()
    time_elapsed = (end_time - start_time) / 60.0
    log.debug(f"The time taken to visualize the statistics is {time_elapsed} minutes")

<span style="font-family:Papyrus; font-size:1.25em;">

We call our data analysis function.  This function utilizes the "tweet_created_at", "company_derived" attributes/fields to compute the time series statistics.<br>

</span>

In [None]:
    # Display Tweet count by time-date time series statistics.
    tweet_count_by_timedate_time_series(
        "D:/Dropbox/summer-research-2019/jupyter-notebooks/attribute-datasets/selected-attributes-final.csv",
        "csv")

<span style="font-family:Papyrus; font-size:1.25em;">

In the first graph, we can see that more of the Tweets were created relatively recently in 2017 and 2018.  The further we go back in time, the fewer Tweets we have.<br>

In the second series of graphs, we can see when the Tweets were created by the company they are associated with.<br>

</span>

## Resources Used:

<span style="font-family:Papyrus; font-size:1.25em;">

**TODO: convert to annotated bibliography**

Dataset Files (obtained from Borg supercomputer):<br>

dataset_slo_20100101-20180510.json<br>
dataset_20100101-20180510.csv<br>

Note: These are large fiels not included in the project GitHub Repository.<br>


- [SLO-analysis.ipynb](SLO-analysis.ipynb)<br>
    -original SLO Twitter data analysis file from Shuntaro Yada.<br>


- https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/intro-to-tweet-json<br>
    -explanation of all data fields in JSON file format for Tweets.<br>


- https://datatofish.com/export-dataframe-to-csv/<br>
- https://datatofish.com/export-pandas-dataframe-json/<br>
    -saving Pandas dataframe to CSV/JSON<br>
    

- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html<br>
    -Pandas to_datetime() function call.<br>
    

- https://www.machinelearningplus.com/plots/matplotlib-tutorial-complete-guide-python-plot-examples/<br>
    -plotting with matplotlib.<br>


</span>

## TODO's:

<span style="font-family:Papyrus; font-size:1.25em;">

Implement further elements from Shuntaro Yada's SLO Twitter Dataset Analysis.<br>

</span>