## IntelDragon Tailored News Feed

IntelDragon Tailored News Feed is a scalable artificial intelligence platform that identifies relevant trending cyber threats to an organization, industry, company, or product based on a written description of the topic of interest.

### Add User Input

In [1]:
# Example search query

search_query = "cyber attacks"

In [2]:
# Example input text

input_text = """The College of Computing & Informatics is a national leader in information and technology education. With cutting edge curriculum and groundbreaking research led by a world-class faculty, our college is one of the only institutions in the nation that can equip you with the comprehensive knowledge, skills and hands-on experience to drive innovation and improve lives within any industry or career field you choose."""

### Load Modules

In [3]:
import subprocess
import sys
import newspaper
import json
from pygooglenews import GoogleNews
import pandas as pd
import numpy as np
from scipy import spatial
import datetime
from datetime import datetime, date, timedelta
import gensim
from gensim.models import Word2Vec

import nltk
from nltk.tokenize import RegexpTokenizer
from localVariables import stop, article_list

import warnings
warnings.filterwarnings(action = 'ignore')

### Query Google News

In [4]:
date.today() - timedelta(days=1)

gn = GoogleNews()
s = gn.search(search_query, from_=(date.today() - timedelta(days=1)).strftime('%Y-%m-%d'), to_=(date.today()).strftime('%Y-%m-%d'))

### Extract News Articles

In [None]:
# article_list = []

# for i in range(0,len(s["entries"])):
#   try:
#     url = s["entries"][i]["link"]
#     article = newspaper.Article(url=url, language='en')
#     article.download()
#     article.parse()
#     article ={
#       "title": str(article.title),
#       "text": str(article.text),
#       "authors": article.authors,
#       "published_date": str(article.publish_date),
#       "top_image": str(article.top_image),
#       "videos": article.movies,
#       "keywords": article.keywords,
#       "summary": str(article.summary),
#       "url": str(url)
#     }
#     article_list.append(article)
#   except:
#     pass

In [5]:
article_list[0]["title"]

'Into the Breach: Breaking Down 3 SaaS App Cyber Attacks in 2022'

### Get Natural Language Processing Model

In [6]:
title = []
data = []
  
for a in article_list:
    for sentence in a["text"].split("\n"):
        sentence_list = []
        for word in sentence.split(" "):
            sentence_list.append(word)
        title.append(a["title"])
        data.append(sentence_list)

# model = gensim.models.Word2Vec(data, min_count = 1, vector_size = 100, window = 5, sg = 1)
model = gensim.models.Word2Vec.load('Cybersecurity_Unigram_01.bin')

tokenizer = RegexpTokenizer(r'[0-9A-Za-z]*[A-Za-z][0-9A-Za-z]*')
def convert_to_tokens(text):
    intermediate = tokenizer.tokenize(text)
    intermediate = [(i.lower() if i[1:].lower()==i[1:] else i) for i in intermediate]
    intermediate = [i for i in intermediate if i not in stop]
    return intermediate

### Get News Article Vectors

In [17]:
article_titles = []
article_vectors = []
article_texts = []
article_url = []
for article in article_list:
    if "Types of Cyber Attacks:" in article["title"]:
        pass
    else:
        article_vector = []
        sentences = article["text"].split("\n")
        for line in sentences:
            line_vector = []
            for word in convert_to_tokens(line):
                try:
                    line_vector.append(model.wv.get_vector(word))
                except:
                    pass
            if len(line_vector)>0:
                article_vector.append(np.mean(line_vector, axis=0))
                
        article_vectors.append(article_vector)
        article_titles.append(article["title"])
        article_url.append(article["url"])
        article_texts.append(article["text"])
        
# data = np.array(article_vectors) # sentence vectors
labels = np.array(article_titles) # article titles
texts = np.array(article_texts) # article texts
urls = np.array(article_url)

In [18]:
len(article_vectors)

96

### Get User Input Text Vector

In [19]:
sentence_vector = []
line_vector = []
for sentence in input_text.replace("\n","").split("."):
    line_vector = []
    for word in sentence:
        try:
            line_vector.append(model.wv.get_vector(word))
        except:
            pass
        if len(line_vector)>0:
            sentence_vector = np.mean(line_vector, axis=0)

### Identify Relevant Results

In [34]:
# tree = spatial.KDTree(data)

# print(labels[tree.query(sentence_vector, k=5)[1][0]])
# print(urls[tree.query(sentence_vector, k=5)[1][0]]+"\n")

# print(labels[tree.query(sentence_vector, k=5)[1][1]])
# print(urls[tree.query(sentence_vector, k=5)[1][1]]+"\n")

# print(labels[tree.query(sentence_vector, k=5)[1][2]])
# print(urls[tree.query(sentence_vector, k=5)[1][2]]+"\n")

# print(labels[tree.query(sentence_vector, k=5)[1][3]])
# print(urls[tree.query(sentence_vector, k=5)[1][3]]+"\n")

# print(labels[tree.query(sentence_vector, k=5)[1][4]])
# print(urls[tree.query(sentence_vector, k=5)[1][4]]+"\n")

sorted_articles = []
for a in range(0,len(article_vectors)-1):
    relevance = 10.0
    for sentence in article_vectors[a]:
        dist = spatial.distance.cosine(sentence_vector, sentence)
        if dist < relevance:
            relevance = dist
    sorted_articles.append([a, relevance])
sorted_articles.sort(key=lambda l: l[1])

for a, score in sorted_articles[:10]:
    print(article_titles[a])
    print(score)
    print("")

CYBER SECURITY ENHANCED: NTT DATA BUSINESS SOLUTIONS AND SECURITYBRIDGE EXTEND THEIR PARTNERSHIP
0.2830878496170044

SmallSat Launch Company Teams with C8 Secure to Provide Cybersecurity Solutions for Space Industry
0.40529143810272217

Computer Services : Is Your Institution Prepared for the Computer-Security Incident Rule?
0.4232533574104309

Everbridge (EVBG), Atalait Tie Up to Provide CEM Solutions
0.4661410450935364

Chinese hackers targeted 7 Indian power hubs, govt says ops failed
0.47882723808288574

Pacific Institute Water Conflict Chronology Updated
0.48011887073516846

Aviation Cyber Security Market Is Booming Worldwide with Airbus, BAE Systems – Bloomingprairieonline
0.5093559920787811

IT Insight: MDR-Managed Detection and Response
0.5262599587440491

US Cyber Command reinforces Ukraine and allies amid Russian onslaught
0.5324267148971558

Cyber phases of a hybrid war. Catphishing in Israel. China snoops India's grid. Princes and dissidents.
0.5738224387168884



In [None]:
article_titles

In [None]:
len(article_titles)

In [None]:
len(labels)

In [None]:
2+2