<a href="https://www.kaggle.com/code/dascient/uacp-defining-powellscore-veracity-variables?scriptVersionId=132190261" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# UACP - Defining PowellScore & Veracity Variables

## [NLP - Sentiment Intensity Analyzer](https://github.com/cjhutto/vaderSentiment) Against Reporting Comments
​
Here we isolate only pertinent variables from the original dataset. For the sake of computational efficiency, we only use 500 samples from the reports. We've also decided to leave open most of the code cells below; enabling transparency on foundation of both variables. 

In [1]:
# for the sake of expeditious analysis
import warnings
warnings.filterwarnings("ignore")
from IPython.display import clear_output
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from matplotlib import pyplot as plt
from shapely.geometry import Point
import geopandas as gpd
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
from geopandas import GeoDataFrame
import matplotlib.colors as colors
import seaborn as sns
import random as r

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
        #print('Files loaded.')
        
pd.set_option('display.max_colwidth', None)

# loading first nuforc dataframe
og_df1 = pd.read_csv('/kaggle/input/ufo-sightings/ufos.csv',header=0)
df = og_df1.dropna().copy()
og_df2 = pd.read_csv('/kaggle/input/d/NUFORC/ufo-sightings/scrubbed.csv',header=0)
df2 = og_df2.dropna().copy()

#############################################
# sanitize
# drop some columns, for now
df = df.drop(columns=['datetime','duration (hours/min)'])

# date posted deemed to be easily conveible to timestamp values, so i'm gonna work with that for now.
df['date posted'] = df['date posted'].astype('datetime64[ns]')


# length of comments
df['comment_length'] = [len(str(v[0:500])) for i,v in df.comments.items()]


# convert seconds to minutes
df["duration (minutes)"] = [int(v)/60 for i,v in df["duration (seconds)"].items()]


# creating Geo Point column for sopecial use below
df['Geo Point'] = df.apply(lambda x:'%s, %s' % (x['latitude'],x['longitude']),axis=1)


# let's create subsets of our 80,000 here: 
# we can implement conditionals, remove/analyze outliers, 
# & will enable for back referencing when starting to run 
# robust AI-ML modeling that would otherwise take much longer to run.

# let's create subsets from the main dataframe/reporting-data w/ respect to duration of observations
df_under100 = df[df["duration (minutes)"]<100]
df_under60 = df[df["duration (minutes)"]<60]

# random binary column for future AI-ML modeling.
a=['balloon','spacejunk','sensor_malfunction','undentified','anomalous']     
df['verified'] = pd.Series(r.choices(a,k=len(df),weights=(50, 40, 30, 20, 10)),index=df.index)

# shape-focused
circles = df[df['shape'] == 'circle']
spheres = df[df['shape'] == 'sphere']
lights = df[df['shape'] == 'light']
teardrops = df[df['shape'] == 'teardrop']

# year-month
df['year_month'] = df['date posted'].dt.to_period('M')

# ca_oval
ca_oval = df[df.state=='ca'].reset_index(drop=True)
ca_oval = ca_oval[ca_oval['shape']=='oval']


# show
print("Our dataset(s).")
print(f"\nReports: {len(df)} non-null dataframe.")
print("\nMatrix:",df.shape[0],"rows,",df.shape[1],"columns")
df = df.sort_values('date posted',ascending=True).reset_index(drop=True)
df.tail(11).reset_index(drop=True).style.background_gradient(cmap ='seismic').set_properties(**{'font-size': '11px'}).set_properties(**{'text-align': 'left'})

/kaggle/input/d/NUFORC/ufo-sightings/complete.csv
/kaggle/input/d/NUFORC/ufo-sightings/scrubbed.csv
/kaggle/input/ufo-sightings/ufos.csv
Our dataset(s).

Reports: 66516 non-null dataframe.

Matrix: 66516 rows, 14 columns


Unnamed: 0,city,state,country,shape,duration (seconds),comments,date posted,latitude,longitude,comment_length,duration (minutes),Geo Point,verified,year_month
0,henderson,nv,us,fireball,20.0,"Very Strange Red Sphere Over Henderson, Nevada.",2014-05-08 00:00:00,36.039722,-114.981111,50,0.333333,"36.0397222, -114.9811111",spacejunk,2014-05
1,waxhaw,nc,us,circle,60.0,Bright orb that rapidly traveled west leaving a light trail and vanishing.,2014-05-08 00:00:00,34.924444,-80.743611,74,1.0,"34.9244444, -80.7436111",undentified,2014-05
2,mount hope (canada),on,ca,teardrop,2700.0,"Shell shaped object twitching it's way up in the sky, same time every other night.",2014-05-08 00:00:00,43.14,-79.9,88,45.0,"43.14, -79.9",balloon,2014-05
3,atkinson,nh,us,sphere,300.0,Flashlight made UFO disappear,2014-05-08 00:00:00,42.838333,-71.1475,29,5.0,"42.8383333, -71.1475",balloon,2014-05
4,lombard,il,us,circle,20.0,Bright red & yellow colored ball flying west to east in a straight line.,2014-05-08 00:00:00,41.88,-88.007778,76,0.333333,"41.88, -88.0077778",anomalous,2014-05
5,waxhaw,nc,us,circle,60.0,Bright orb that dimmed & got smaller before vanishing in the sky.,2014-05-08 00:00:00,34.924444,-80.743611,69,1.0,"34.9244444, -80.7436111",sensor_malfunction,2014-05
6,hialeah,fl,us,light,120.0,My wife stepped outside to get some laundry and she noticed a bright orange light hovering in the sky and called me to come out and see,2014-05-08 00:00:00,25.857222,-80.278333,135,2.0,"25.8572222, -80.2783333",spacejunk,2014-05
7,grayson,ga,us,sphere,20.0,"On May 3 around 9 pm I was studying Mars with an outdoor telescope on a clear night when a perfectly round , bright light appeared Sout",2014-05-08 00:00:00,33.894167,-83.955833,138,0.333333,"33.8941667, -83.9558333",sensor_malfunction,2014-05
8,currie,nc,us,light,120.0,Brilliantly Lit Flying Object With Reddish Orange Lights.,2014-05-08 00:00:00,34.4625,-78.101389,57,2.0,"34.4625, -78.1013889",balloon,2014-05
9,kuna,id,us,circle,600.0,"Bright Orange light(orbs) 1 multiplied to 8, Moving in all directions and weird patterns went on for 10 minutes until disappered. Loud",2014-05-08 00:00:00,43.491944,-116.419167,137,10.0,"43.4919444, -116.4191667",balloon,2014-05


In [2]:
# https://www.geeksforgeeks.org/python-sentiment-analysis-using-vader/
# https://github.com/cjhutto/vaderSentiment
# import SentimentIntensityAnalyzer class
# from vaderSentiment.vaderSentiment module.

from nltk.sentiment.vader import SentimentIntensityAnalyzer

# function to print sentiments
# of the sentence.
def sentiment_scores(sentence):

    # Create a SentimentIntensityAnalyzer object.
    sid_obj = SentimentIntensityAnalyzer()
    sentiment_dict = sid_obj.polarity_scores(sentence)
    
    # create a list
    results = []
    results.append({"% Positive":sentiment_dict['pos']*len(sentence),
                    "% Negative":sentiment_dict['neg']*len(sentence),
                    "% Neutral":sentiment_dict['neu']*len(sentence)
                   })
    results = pd.DataFrame(results)
    return results

# Apply to df['comments'] column.
def NLP_PowellScore(commentsColumns):
    
    # obtain each comment for 'comments' column
    eachComment = [eachComment for i,eachComment in commentsColumns.items()]
    eachComment = pd.Series(eachComment)
                               
    # vader.variables.PowellScore
    PowellPositive = [v for v in list([sentiment_scores(sentimentAnalyzedComment)["% Positive"][0] for i,sentimentAnalyzedComment in eachComment.items()])]
    PowellNegative = [v for v in list([sentiment_scores(sentimentAnalyzedComment)["% Negative"][0] for i,sentimentAnalyzedComment in eachComment.items()])]
    PowellNeutral = [v for v in list([sentiment_scores(sentimentAnalyzedComment)["% Neutral"][0] for i,sentimentAnalyzedComment in eachComment.items()])]
    
    return PowellPositive,PowellNegative,PowellNeutral

### Sample of 500 reports sorted by Veracity

In [3]:
# defining Powell Scores by sentiment outputs: Positive, Negative, Neutral.

# let's only take a small sample - this will definitely take a few minutes, grab yourself some water...
robert = df.sample(50).copy()

robert["PowellPositive"] = NLP_PowellScore(robert['comments'])[0]
robert["PowellNegative"] = NLP_PowellScore(robert['comments'])[1]
robert["PowellNeutral"] = NLP_PowellScore(robert['comments'])[2]

# PowellScore 
robert["PowellScore"] = (robert["PowellPositive"]-robert["PowellNegative"])/robert["PowellNeutral"]

# veracity
robert["veracity"] = robert["PowellScore"]*robert["comment_length"] # can incorporate lexicon analyses in place of the latter multiple.

columns = ['date posted','city','state','shape','comments','comment_length',\
        'latitude','longitude','PowellPositive',\
        'PowellNegative','PowellNeutral','PowellScore','veracity']

robert[columns].sort_values('veracity',ascending=False).head(20).reset_index(drop=True)\
        .style.background_gradient(cmap ='seismic').set_properties(**{'font-size': '11px'})

Unnamed: 0,date posted,city,state,shape,comments,comment_length,latitude,longitude,PowellPositive,PowellNegative,PowellNeutral,PowellScore,veracity
0,2002-10-15 00:00:00,teller,ak,disk,5 CREDITABLE WITNESSES,22,65.263611,-166.360833,16.214,0.0,5.786,2.802281,61.65019
1,2007-08-07 00:00:00,torrance,ca,light,Los Angeles - Bright lights seen by 4 witnesses during shuttle reentry. 12-15 bright lights moved around the sky for 20 minutes.,129,33.835833,-118.339722,31.476,0.0,97.524,0.322751,41.634921
2,2013-08-30 00:00:00,fence,wi,diamond,"Huge orange diamond shaped object hovering in the sky,turned yellow then slowly faded away",93,45.744444,-88.424167,26.133,0.0,66.867,0.390821,36.346314
3,2013-05-15 00:00:00,honolulu,hi,fireball,Bright fireball steaking over Waikiki.,39,21.306944,-157.858333,16.38,0.0,22.62,0.724138,28.241379
4,2000-12-02 00:00:00,vancouver (canada),bc,disk,Six year old witness was sitting on front lawn enjoying sun. Opened eyes and turned head to see med grey disklike object hovering over,134,49.25,-123.133333,23.182,0.0,110.818,0.20919,28.031439
5,2000-06-21 00:00:00,idaho city,id,light,bright light that vanishes,27,43.828611,-115.833611,13.284,0.0,13.716,0.968504,26.149606
6,2013-12-12 00:00:00,duncan,ok,other,Close encounter of the third kind - small blue 'gray' type alien.,71,34.502222,-97.9575,18.034,0.0,52.966,0.340483,24.174263
7,2006-02-14 00:00:00,dallas,tx,disk,"Three UFO' seen over Dallas, TX during daylight hours",59,32.783333,-96.8,16.992,0.0,42.008,0.404494,23.865169
8,2014-04-04 00:00:00,reno,nv,circle,"Bright cicularobject/light moving slowly in center of sky, then slowly eastward",82,39.529722,-119.812778,18.45,0.0,63.55,0.290323,23.806452
9,2011-01-31 00:00:00,fayetteville,nc,oval,Orange bright light over fayetteville Nc,40,35.0525,-78.878611,14.68,0.0,25.32,0.579779,23.191153


In [4]:
robert[['date posted','city','state','shape','duration (minutes)',\
        'comments','latitude','longitude','PowellPositive','PowellNegative',\
        'PowellNeutral','PowellScore','veracity']].sort_values('veracity', ascending=False).reset_index(drop=True)\
        .style.background_gradient(cmap ='seismic').set_properties(**{'font-size': '11px'})

Unnamed: 0,date posted,city,state,shape,duration (minutes),comments,latitude,longitude,PowellPositive,PowellNegative,PowellNeutral,PowellScore,veracity
0,2002-10-15 00:00:00,teller,ak,disk,30.0,5 CREDITABLE WITNESSES,65.263611,-166.360833,16.214,0.0,5.786,2.802281,61.65019
1,2007-08-07 00:00:00,torrance,ca,light,20.0,Los Angeles - Bright lights seen by 4 witnesses during shuttle reentry. 12-15 bright lights moved around the sky for 20 minutes.,33.835833,-118.339722,31.476,0.0,97.524,0.322751,41.634921
2,2013-08-30 00:00:00,fence,wi,diamond,6.0,"Huge orange diamond shaped object hovering in the sky,turned yellow then slowly faded away",45.744444,-88.424167,26.133,0.0,66.867,0.390821,36.346314
3,2013-05-15 00:00:00,honolulu,hi,fireball,0.083333,Bright fireball steaking over Waikiki.,21.306944,-157.858333,16.38,0.0,22.62,0.724138,28.241379
4,2000-12-02 00:00:00,vancouver (canada),bc,disk,0.5,Six year old witness was sitting on front lawn enjoying sun. Opened eyes and turned head to see med grey disklike object hovering over,49.25,-123.133333,23.182,0.0,110.818,0.20919,28.031439
5,2000-06-21 00:00:00,idaho city,id,light,4.0,bright light that vanishes,43.828611,-115.833611,13.284,0.0,13.716,0.968504,26.149606
6,2013-12-12 00:00:00,duncan,ok,other,0.083333,Close encounter of the third kind - small blue 'gray' type alien.,34.502222,-97.9575,18.034,0.0,52.966,0.340483,24.174263
7,2006-02-14 00:00:00,dallas,tx,disk,10.0,"Three UFO' seen over Dallas, TX during daylight hours",32.783333,-96.8,16.992,0.0,42.008,0.404494,23.865169
8,2014-04-04 00:00:00,reno,nv,circle,50.0,"Bright cicularobject/light moving slowly in center of sky, then slowly eastward",39.529722,-119.812778,18.45,0.0,63.55,0.290323,23.806452
9,2011-01-31 00:00:00,fayetteville,nc,oval,0.5,Orange bright light over fayetteville Nc,35.0525,-78.878611,14.68,0.0,25.32,0.579779,23.191153


## Ovals seen between Imperial Beach & Blythe, California

In [5]:
# only ovals
ca_oval_162 = ca_oval.sort_values(['latitude','longitude']).head(100)
robert_ca_oval_162 = ca_oval_162
robert_ca_oval_162["PowellPositive"] = NLP_PowellScore(robert_ca_oval_162['comments'])[0]
robert_ca_oval_162["PowellNegative"] = NLP_PowellScore(robert_ca_oval_162['comments'])[1]
robert_ca_oval_162["PowellNeutral"] = NLP_PowellScore(robert_ca_oval_162['comments'])[2]

# PowellScore 
robert_ca_oval_162["PowellScore"] = (robert_ca_oval_162["PowellPositive"]-robert_ca_oval_162["PowellNegative"])/robert_ca_oval_162["PowellNeutral"]

# veracity
robert_ca_oval_162["veracity"] = robert_ca_oval_162["PowellScore"]*robert_ca_oval_162["comment_length"] # can incorporate lexicon analyses in place of the latter multiple.


robert_ca_oval_162[['date posted','city','state','shape','comments','comment_length','duration (minutes)',\
        'latitude','longitude','PowellPositive','PowellScore','veracity']].sort_values(['veracity'],ascending=True).reset_index(drop=True)\
        .style.background_gradient(cmap ='seismic').set_properties(**{'font-size': '11px'})

Unnamed: 0,date posted,city,state,shape,comments,comment_length,duration (minutes),latitude,longitude,PowellPositive,PowellScore,veracity
0,2012-06-05 00:00:00,bermuda dunes,ca,oval,"Dark disc-shaped object moving across night sky with great(supersonic+) speed, no visible lights, no sound, VERY FAST.",127,0.05,33.742778,-116.288333,0.0,-0.293661,-37.294955
1,2000-03-16 00:00:00,san diego,ca,oval,"Already left message on answering machine. My son was out jogging about a block and a half from our home,and stopped in his tracks whe",138,2.0,32.715278,-117.156389,0.0,-0.228501,-31.53317
2,2003-04-22 00:00:00,pacific beach,ca,oval,Mystery craft scares Pacific Beach residents,44,30.0,32.797778,-117.239444,0.0,-0.47929,-21.088757
3,2002-01-11 00:00:00,vista,ca,oval,"helicopter shaped craft, many lights, no chopper blades, 10pm san diego california. 12-18-01.",102,0.416667,33.2,-117.241667,0.0,-0.183432,-18.710059
4,2004-06-04 00:00:00,avalon,ca,oval,"Round, tall craft over Catalina Island, two teardrop black crafts inter-weaving with another at low altitude",114,5.0,33.342778,-118.326944,0.0,-0.140251,-15.988597
5,1999-12-16 00:00:00,palm springs,ca,oval,I was traveling north by automobile when I observed an object traveling in a southernly direction at a very slow rate of speed. I stop,135,3.0,33.830278,-116.544444,0.0,-0.116071,-15.669643
6,2012-05-29 00:00:00,palm springs,ca,oval,3 red cigar shaped UFO's flying overhead. 1 stopped and remained stationary for about 30 seconds.,101,2.0,33.830278,-116.544444,0.0,-0.146789,-14.825688
7,2008-10-31 00:00:00,trabuco canyon,ca,oval,two orbs moving in the sky changing direction so quickly and in a fashion of no aircraft I know of can do,105,20.0,33.6625,-117.589444,0.0,-0.116071,-12.1875
8,2006-02-14 00:00:00,lake elsinore,ca,oval,Large semi transparent green flash,34,0.05,33.668056,-117.326389,0.0,0.0,0.0
9,2012-01-12 00:00:00,homeland,ca,oval,Multi-colored object sighted in night sky,41,60.0,33.743056,-117.108333,0.0,0.0,0.0


## Powell Variables in 3D
This is a 3D-interactive chart that uses the date posted, veracity, & PowellScore variables. Colored by comments. Sized by comment_length.

By definition, these actually render 5-Dimensional charts, if one considers veracity & commenth lengths of reports as 'features of a situation'.

In [None]:
import plotly.express as px
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

# graph
fig = px.scatter_3d(robert, x='date posted', y='veracity', z='PowellScore',
              color='veracity',
              size = 'comment_length',
              hover_name = 'city',
              hover_data=['city','state','comments'],              
              opacity=0.5,
              size_max=17
                   )
fig.show()

### This one shows Date Posted vs PowellScore & PowellPositive Variables.

In [None]:
import plotly.express as px
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

# graph
fig = px.scatter_3d(robert, x='date posted', y='PowellScore', z='PowellPositive',
              color='veracity',
              size = 'comment_length',
              hover_name = 'city',
              hover_data=['city','state','comments'],              
              opacity=0.5,
              size_max=17
                   )
fig.show()

### Ovals seen between Imperial Beach & Blythe, California - Date Posted vs PowellNeutral vs PowellScore

In [None]:
import plotly.express as px
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

# graph
fig = px.scatter_3d(robert_ca_oval_162, x='date posted', y='PowellNeutral', z='PowellScore',
              color='veracity',
              size = 'comment_length',
              hover_name = 'city',
              hover_data=['city','state','comments','shape'],              
              opacity=0.5,
              size_max=17
                   )
fig.show()

These variables are still very much in progress & there currently is no process for defining them. Despite the disparate, disconnected, & wide range of skeptic/non-skeptic relational databases — we have managed to connect with organizations that promote open source — public repositories & most are willing to coordinate with one another in developing a UAP Reporting & Events Hub. Wherein all pertinent reports, sightings, measurements, & signatures are to be populated by various factors from multiple disciplines & technologies. We will do our best to coordinate with prominent key members of the UAP community in order to contribute to building out a “standardized” reporting mechanism in an intelligible & non-duplicative fashion. We are looking for ways forward in getting access to real-time, current reports.

The goal would be to create something similar to an Order of Battle, so that reports at specific times & locations can be compared to past reports to augment credibility determination, as well as eventually be compared to known events that may explain them. Once those explanations are vetted, reports would be coded by likelihood of mundane vs anomalous, which would aid in the processing of similar events in the future.

In addition, we have already begun looking for trends over time, such as the time of day when reported events take place, & the type of object reported over the decades. The latter can be observed in the “Shapes by Share of Reports” chart, which provides indications of confirmation bias in observed behavior.

Finally, big data analysis (alongside robust AI|ML|DS modeling techniques) could also provide insight into the development of improved collection & reporting processes, which currently appear to be undefined, improving the quality of the data we receive. — K. Kolbe.

# Different NLP Methods

# DaS-VADER Sentiment Analyzer

In [None]:
from collections import Counter

Counter(df["comments"]).split().most_common(100)

In [10]:
a = [v.split(' ') for i,v in df.comments.items()]
flatlist=[]
for sublist in a:
    for element in sublist:
        flatlist.append(element)
comments = pd.DataFrame(flatlist, columns=['words'])
comments

Unnamed: 0,words
0,Family
1,traveling
2,home
3,along
4,a
...,...
944552,lights
944553,seen
944554,over
944555,Parkersburg&#44


In [12]:
comments.words.value_counts().head(50)

the        27423
in         26437
a          22469
and        20823
           17285
of         16512
to         12726
light      11670
lights     11457
over       10989
I          10721
object      9841
was         9482
sky         8879
at          8396
with        8091
moving      6877
bright      6710
on          6623
then        5771
from        5674
it          5018
white       4787
saw         4649
shaped      4533
my          4532
that        4448
seen        4237
orange      4107
red         4008
very        3829
flying      3570
like        3483
A           3436
Bright      3399
for         3357
craft       3346
sky.        2986
no          2966
3           2892
across      2885
objects     2849
were        2751
about       2704
UFO         2699
out         2625
above       2543
up          2536
an          2516
by          2505
Name: words, dtype: int64