<a href="https://colab.research.google.com/github/SRI-CSL/signal-public/blob/signal-demonstration/colabs/signal_interest_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **SIGNAL**ing Interest Data

**Description:** Generating `interest` dataframe via SIGNAL API.

**Copyright 2022 SRI International.**

This project is under the GPL3 License. See the [LICENSE](https://www.gnu.org/licenses/gpl-3.0.en.html) file for the full license text.

## &#9776; Preamble

Install the `SIGNAL API` client

In [None]:
!curl https://signal.cta.sri.com/client > client.tgz
!tar xzf client.tgz
!pip install -r signal_api_client/requirements.txt
!pip install -e signal_api_client
!pip install ipympl
%cd /content/signal_api_client

Download the `funcs` utilities repository.

In [2]:
!git clone https://github.com/hsanchez/funcs.git &> /dev/null

## &#9776; Dependencies

In [3]:
import os
import sys
import time
import warnings

import json
import pickle
import pathlib
import zipfile
import re

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from typing import List, Any, Dict, Tuple
from datetime import date, datetime

In [4]:
import funcs as utils

In [5]:
try:
    from google.colab import data_table, output
    data_table.disable_dataframe_formatter()
    output.enable_custom_widget_manager()
except Exception:
    print("Launched notebook locally")

In [8]:
from signal_api import signal

## &#9997; Configuration

In [6]:
warnings.filterwarnings("ignore")

In [7]:
%matplotlib inline
%config InlineBackend.figure_format='retina'

## &#128272; Login

In [9]:
signal.login()

username?: ··········
password?: ··········


True

In [10]:
start_date = datetime(2020, 8, 1)
end_date = datetime(2020, 8, 2)

email_df = signal.query_dataframe(f"SELECT * from email where timestamp_sent > {start_date.timestamp()} and timestamp_sent < {end_date.timestamp()};")

In [11]:
email_df.describe()

Unnamed: 0,id,mailing_list_id,author_id,timestamp_sent,timestamp_recv
count,276.0,276.0,276.0,276.0,276.0
mean,29334.01087,1.0,1005.271739,1596287000.0,1596291000.0
std,7046.209882,0.0,1046.011324,18184.04,18184.04
min,34.0,1.0,3.0,1596250000.0,1596253000.0
25%,30614.75,1.0,35.0,1596272000.0,1596276000.0
50%,31002.5,1.0,631.0,1596285000.0,1596289000.0
75%,31425.25,1.0,1983.0,1596301000.0,1596305000.0
max,33521.0,1.0,2752.0,1596325000.0,1596329000.0


In [12]:
email_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 276 entries, 0 to 275
Data columns (total 15 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   id                   276 non-null    int64 
 1   url                  276 non-null    object
 2   mailing_list_id      276 non-null    int64 
 3   email_id             276 non-null    object
 4   message_id           276 non-null    object
 5   reply_to_url         112 non-null    object
 6   author_id            276 non-null    int64 
 7   timestamp_sent       276 non-null    int64 
 8   timestamp_recv       276 non-null    int64 
 9   subject              276 non-null    object
 10  body                 276 non-null    object
 11  clean_body           276 non-null    object
 12  thread_id            276 non-null    object
 13  persuasion           276 non-null    object
 14  reply_to_message_id  276 non-null    object
dtypes: int64(5), object(10)
memory usage: 32.5+ KB


In [13]:
email_df.head()

Unnamed: 0,id,url,mailing_list_id,email_id,message_id,reply_to_url,author_id,timestamp_sent,timestamp_recv,subject,body,clean_body,thread_id,persuasion,reply_to_message_id
0,59,https://lkml.iu.edu/hypermail/linux/kernel/200...,1,20200801175938,20200801215806.2659-1-cengiz@kernel.wtf,,35,1596319178,1596322778,[PATCH v5] staging: atomisp: move null check t...,`find_gmin_subdev()` that returns a pointer to...,`find_gmin_subdev()` that returns a pointer to...,20200731083856.GF3703480@smile.fi.intel.com,Unknown,20200731083856.GF3703480@smile.fi.intel.com
1,34,https://lkml.iu.edu/hypermail/linux/kernel/200...,1,20200801021814,202007312237.4F385EB3@keescook,,23,1596262694,1596266294,Re: [PATCH v5 13/36] vmlinux.lds.h: add PGO an...,"On Fri, Jul 31, 2020 at 11:51:28PM -0400, Arvi...","On Fri, Jul 31, 2020 at 11:51:28PM -0400, Arvi...",20200731230820.1742553-1-keescook@chromium.org,Unknown,20200801035128.GB2800311@rani.riverdale.lan
2,35,https://lkml.iu.edu/hypermail/linux/kernel/200...,1,20200801021841,202008011403.PtFkHpqE%lkp@intel.com,https://lkml.iu.edu/hypermail/linux/kernel/200...,24,1596262721,1596266321,Re: [PATCH v3 21/23] device-dax: Add an 'align...,"Hi Dan,\n\nThank you for the patch! Yet someth...","Hi Dan,\n\nThank you for the patch! Yet someth...",159625241660.3040297.3801913809845542130.stgit...,Unknown,159625241660.3040297.3801913809845542130.stgit...
3,39,https://lkml.iu.edu/hypermail/linux/kernel/200...,1,202008010218140,202008011419.67BkWnAl%lkp@intel.com,,24,1596262694,1596266294,Re: [PATCH v3 21/23] device-dax: Add an 'align...,"Hi Dan,\n\nThank you for the patch! Yet someth...","Hi Dan,\n\nThank you for the patch! Yet someth...",159625241660.3040297.3801913809845542130.stgit...,Unknown,159625241660.3040297.3801913809845542130.stgit...
4,40,https://lkml.iu.edu/hypermail/linux/kernel/200...,1,20200801053958,s5h7dui902e.wl-tiwai@suse.de,https://lkml.iu.edu/hypermail/linux/kernel/200...,3,1596274798,1596278398,Re: [PATCH] ALSA: seq: KASAN: use-after-free R...,"On Sat, 01 Aug 2020 08:24:03 +0200,\n<qiang.zh...","On Sat, 01 Aug 2020 08:24:03 +0200,\n<qiang.zh...",20200801062403.8005-1-qiang.zhang@windriver.com,Unknown,20200801062403.8005-1-qiang.zhang@windriver.com


In [14]:
TABLES_QUERY = "SELECT * from information_schema.tables;"
df_tables = signal.query_dataframe(TABLES_QUERY)

In [15]:
df_tables.head()

Unnamed: 0,table_catalog,table_schema,table_name,table_type,self_referencing_column_name,reference_generation,user_defined_type_catalog,user_defined_type_schema,user_defined_type_name,is_insertable_into,is_typed,commit_action
0,signal,public,scraped_projects,BASE TABLE,,,,,,YES,NO,
1,signal,public,scraped_patches,BASE TABLE,,,,,,YES,NO,
2,signal,public,scraped_patch_series,BASE TABLE,,,,,,YES,NO,
3,signal,public,diff,BASE TABLE,,,,,,YES,NO,
4,signal,public,thread,BASE TABLE,,,,,,YES,NO,


In [16]:
df_tables.shape

(221, 12)