# Exploratory notebook with SMAT
Testing out our ability to query SMAT data via the API and analyze it in a notebook.

In [None]:
%pip install -e ..

Note, if you recently updated the `smatter` package, you will need to run `%pip uninstall smatter -y` first, then reinstall it with the above command. You may also need to restart the kernel so it starts with the new updated library.

Let's import the API library:

In [1]:
from smatter.api import SMAT

And create an instance of our SMAT client.

In [2]:
s = SMAT()

# Searching for `gendermapper`

We first heard about anti-trans account `GenderMapper` from [this tweet](https://twitter.com/esqueer_/status/1574053263904759809) [[archived]](https://archive.ph/S6UMQ). We want to understand how pervasive this account might be in the places collected by SMAT.

We can start with a quick timeseries for the term `gendermapper`.

In [3]:
timeseries_gendermapper = s.timeseries(term='gendermapper')

Did we get content back?

In [4]:
timeseries_gendermapper

{'created_key': 'createdAtformatted',
 'took': 5,
 'timed_out': False,
 '_shards': {'total': 28, 'successful': 28, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 0, 'relation': 'eq'},
  'max_score': None,
  'hits': []},
 'aggregations': {'createdAtformatted': {'buckets': []}}}

Meh. We should probably expand our search on two fronts: expanding the `site` paramater with more sites, and expanding the time window with the `since` and `until` parameters.

Let's first determine the "now" timestamp in the proper format for SMAT. We can also use `timedelta` for a 2 year time window.

In [5]:
import datetime

now = datetime.datetime.utcnow()
now_minus_2y = now - datetime.timedelta(days=730)

now = now.strftime('%Y-%m-%dT%H:%M:%S.%f')
now_minus_2y = now_minus_2y.strftime('%Y-%m-%dT%H:%M:%S.%f')

Now let's expand our sites. We can actually see our valid sites with the `SMAT.SITES` variable.

In [6]:
s.SMAT_SITES

['rumble_video',
 'rumble_comment',
 'bitchute_video',
 'bitchute_comment',
 'rutube_video',
 'rutube_comment',
 'tiktok_video',
 'tiktok_comment',
 'lbry_video',
 'lbry_comment',
 '8kun',
 '4chan',
 'gab',
 'parler',
 'win',
 'poal',
 'telegram',
 'kiwifarms',
 'gettr',
 'wimkin',
 'mewe',
 'minds',
 'vk',
 'truth_social']

Let's just pass all of those as are in-scope sites for the query.

So, let's start it over:

In [7]:
timeseries_gendermapper_2 = s.timeseries(
    term='gendermapper',
    site = 'parler',
    since = now_minus_2y,
    until = now
)

Let's see what we got!

In [8]:
timeseries_gendermapper_2

{'created_key': 'createdAtformatted',
 'took': 14,
 'timed_out': False,
 '_shards': {'total': 28, 'successful': 28, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 2, 'relation': 'eq'},
  'max_score': None,
  'hits': []},
 'aggregations': {'createdAtformatted': {'buckets': [{'key_as_string': '2022-01-31T00:00:00.000Z',
     'key': 1643587200000,
     'doc_count': 1},
    {'key_as_string': '2022-02-01T00:00:00.000Z',
     'key': 1643673600000,
     'doc_count': 0},
    {'key_as_string': '2022-02-02T00:00:00.000Z',
     'key': 1643760000000,
     'doc_count': 0},
    {'key_as_string': '2022-02-03T00:00:00.000Z',
     'key': 1643846400000,
     'doc_count': 0},
    {'key_as_string': '2022-02-04T00:00:00.000Z',
     'key': 1643932800000,
     'doc_count': 0},
    {'key_as_string': '2022-02-05T00:00:00.000Z',
     'key': 1644019200000,
     'doc_count': 0},
    {'key_as_string': '2022-02-06T00:00:00.000Z',
     'key': 1644105600000,
     'doc_count': 0},
    {'key_as_string': '2022

OK! Some sites have content, other sites don't.

For the above `parler` data, let's put it in a dataframe.

In [9]:
df_timeseries_gendermapper_2 = s.timeseries(
    term='gendermapper',
    site = 'parler',
    since = now_minus_2y,
    until = now,
    output = 'df'
)

In [10]:
df_timeseries_gendermapper_2

Unnamed: 0,key_as_string,key,doc_count
0,2022-01-31T00:00:00.000Z,1643587200000,1
1,2022-02-01T00:00:00.000Z,1643673600000,0
2,2022-02-02T00:00:00.000Z,1643760000000,0
3,2022-02-03T00:00:00.000Z,1643846400000,0
4,2022-02-04T00:00:00.000Z,1643932800000,0
...,...,...,...
96,2022-05-07T00:00:00.000Z,1651881600000,0
97,2022-05-08T00:00:00.000Z,1651968000000,0
98,2022-05-09T00:00:00.000Z,1652054400000,0
99,2022-05-10T00:00:00.000Z,1652140800000,0


Can we plot it? yes!

In [None]:
%pip install altair

In [19]:
import altair as alt

In [20]:
alt.Chart(
    df_timeseries_gendermapper_2,
    padding={"left": 10, "top": 10, "right": 10, "bottom": 20}
).mark_bar().encode(
    x=alt.X('key_as_string', sort=['key_as_string']),
    y=alt.Y('doc_count')
)

  for col_name, dtype in df.dtypes.iteritems():


Obviously low volume makes this kind of a lame plot.