## 1. Some Context. Web scraping was switched to the Pushshift API instead of PRAW, since PRAW limits the amount of scraping. With the API, I scraped the r/adhd subreddit 18 times, each time to try to match up with a descriptor of ADHD symptoms. I based API searches off of the "Adult ADHD Clinical Diagnostic Scale".  
https://datashare.nida.nih.gov/sites/default/files/studydocs/276/MDS0007_CRF.pdf


In [62]:
!python -m pip install -U gensim
import gensim 
from nltk.tokenize import sent_tokenize
from nltk.tokenize.treebank import TreebankWordTokenizer
import nltk
nltk.download('punkt')
import glob
from pathlib import Path



[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\mthom\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


##  Some pre-processing and doc segmentation out of order

In [63]:
import os

base_dir = '/Users/mthom/OneDrive - Emory University/Senior 1/QTM 340/CORPORA/' ##Found it easier to run the operations on my computer locally.
all_docs = [] # our list which will store the text of each doc; empty for now

docs = os.listdir(base_dir) # get a list of all the files in the directory

for doc in docs: # iterate through  docs
    if not doc.startswith('.'): # get only the .txt files
        with open(base_dir + doc, "r", encoding="utf-8") as file: # force unicode conversion to keep PCs happy
            text = file.read() # read in the file as a single text string
            all_docs.append(text) # append it to the all_docs list

# Should be 18
len(all_docs)


18

## Some light cleaning, the Pushshift API had stuff come up surprisingly clean already though which is a blessing. Also find the length of the docs. 

In [64]:
# the handy nltk tokenizer 
tokenizer = TreebankWordTokenizer()

# Get the titles of the docs. The title name represents what was scraped through the Pushshift API from reddit
directory = '/Users/mthom/OneDrive - Emory University/Senior 1/QTM 340/CORPORA/'
files = glob.glob(f"{directory}/*.txt")
titles = [Path(file).stem for file in files]

# and the function
def make_sentences(list_txt):
    all_txt = []
    counter = 0
    for txt in list_txt:
        lower_txt = txt.lower()
        sentences = sent_tokenize(lower_txt)
        sentences = [tokenizer.tokenize(sent) for sent in sentences]
        all_txt += sentences
        print(titles[counter]) # Print the focus of each document
        print("Sentences: " + str(len(sentences)))  # See how long each document is. 
        counter += 1
    return all_txt

sentences = make_sentences(all_docs)

avoid
Sentences: 41363
blurts
Sentences: 11411
careless
Sentences: 24317
distracted
Sentences: 56592
fidget
Sentences: 32259
forgetful
Sentences: 56017
instructions
Sentences: 35801
interrupts
Sentences: 26131
listen
Sentences: 102072
misplaces
Sentences: 128237
moves
Sentences: 27728
onthego
Sentences: 37789
organization
Sentences: 48913
payingattention
Sentences: 75098
quiet
Sentences: 55045
seated
Sentences: 95135
talks
Sentences: 18820
waiting
Sentences: 19604


### Some of the docs are significantly larger than others, which likely indicates the phrasing used did not match up with people's expierences OR this symptom might not be as common as expected.

## The model below is used to test out some words.

In [65]:
adhd_model = gensim.models.Word2Vec(
    sentences,
    min_count=6, # Tried a few times, 6 works well.
    vector_size=400, # Tested a few times, 400 works well.
    workers=5) 

INFO : collecting all words and their counts
INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
INFO : PROGRESS: at sentence #10000, processed 245275 words, keeping 11380 word types
INFO : PROGRESS: at sentence #20000, processed 474520 words, keeping 16616 word types
INFO : PROGRESS: at sentence #30000, processed 699954 words, keeping 20778 word types
INFO : PROGRESS: at sentence #40000, processed 922044 words, keeping 24491 word types
INFO : PROGRESS: at sentence #50000, processed 1143127 words, keeping 27406 word types
INFO : PROGRESS: at sentence #60000, processed 1367555 words, keeping 30181 word types
INFO : PROGRESS: at sentence #70000, processed 1597536 words, keeping 32449 word types
INFO : PROGRESS: at sentence #80000, processed 1819395 words, keeping 34529 word types
INFO : PROGRESS: at sentence #90000, processed 2043294 words, keeping 36185 word types
INFO : PROGRESS: at sentence #100000, processed 2288197 words, keeping 38018 word types
INFO : PROGRESS

INFO : Word2Vec lifecycle event {'msg': 'effective_min_count=6 leaves 19741828 word corpus (99.22447720604116%% of original 19896127, drops 154299)', 'datetime': '2021-12-08T17:01:13.316891', 'gensim': '4.1.2', 'python': '3.7.10 (default, Feb 26 2021, 13:06:18) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.19041-SP0', 'event': 'prepare_vocab'}
INFO : deleting the raw counts dictionary of 113957 items
INFO : sample=0.001 downsamples 55 most-common words
INFO : Word2Vec lifecycle event {'msg': 'downsampling leaves estimated 13867549.303942777 word corpus (70.2%% of prior 19741828)', 'datetime': '2021-12-08T17:01:13.586231', 'gensim': '4.1.2', 'python': '3.7.10 (default, Feb 26 2021, 13:06:18) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.19041-SP0', 'event': 'prepare_vocab'}
INFO : estimated required memory for 29075 words and 400 dimensions: 107577500 bytes
INFO : resetting layer weights
INFO : Word2Vec lifecycle event {'update': False, 'trim_rule': 'None', 'da

INFO : EPOCH 4 - PROGRESS: at 13.22% examples, 624556 words/s, in_qsize 9, out_qsize 0
INFO : EPOCH 4 - PROGRESS: at 18.04% examples, 635650 words/s, in_qsize 9, out_qsize 0
INFO : EPOCH 4 - PROGRESS: at 22.42% examples, 628086 words/s, in_qsize 9, out_qsize 0
INFO : EPOCH 4 - PROGRESS: at 27.28% examples, 634989 words/s, in_qsize 9, out_qsize 0
INFO : EPOCH 4 - PROGRESS: at 31.92% examples, 636753 words/s, in_qsize 9, out_qsize 0
INFO : EPOCH 4 - PROGRESS: at 37.71% examples, 656907 words/s, in_qsize 9, out_qsize 0
INFO : EPOCH 4 - PROGRESS: at 44.00% examples, 679142 words/s, in_qsize 10, out_qsize 0
INFO : EPOCH 4 - PROGRESS: at 50.04% examples, 695322 words/s, in_qsize 9, out_qsize 0
INFO : EPOCH 4 - PROGRESS: at 55.17% examples, 694374 words/s, in_qsize 9, out_qsize 0
INFO : EPOCH 4 - PROGRESS: at 60.53% examples, 696410 words/s, in_qsize 9, out_qsize 0
INFO : EPOCH 4 - PROGRESS: at 65.03% examples, 688272 words/s, in_qsize 9, out_qsize 0
INFO : EPOCH 4 - PROGRESS: at 68.70% examp

In [66]:
adhd_model.save('adhd_model')

INFO : Word2Vec lifecycle event {'fname_or_handle': 'adhd_model', 'separately': 'None', 'sep_limit': 10485760, 'ignore': frozenset(), 'datetime': '2021-12-08T17:03:02.817592', 'gensim': '4.1.2', 'python': '3.7.10 (default, Feb 26 2021, 13:06:18) [MSC v.1916 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.19041-SP0', 'event': 'saving'}
INFO : storing np array 'vectors' to adhd_model.wv.vectors.npy
INFO : storing np array 'syn1neg' to adhd_model.syn1neg.npy
INFO : not storing attribute cum_table
INFO : saved adhd_model


## Time to test out some simplified version of symptoms.
### We'll start with the three titles associated with ADHD (inattention, hyperactive, impulsive

In [67]:
adhd_model.wv.most_similar("inattentive", topn=15) #"pi" means primairly innattentive. "asd" refers to autrism spectrum - which is an interseting association with inattention

[('innatentive', 0.6944334506988525),
 ('pi', 0.6840545535087585),
 ('combined', 0.6721208691596985),
 ('predominantly', 0.6622161269187927),
 ('hyperactive', 0.6334655284881592),
 ('hyperactive-impulsive', 0.6233935952186584),
 ('adhd-pi', 0.6117693781852722),
 ('adhd-inattentive', 0.5840573310852051),
 ('inatentive', 0.5763453841209412),
 ('autism', 0.5720027685165405),
 ('asd', 0.5693138837814331),
 ('ocd', 0.5587474703788757),
 ('adhd-ph', 0.5540632009506226),
 ('aspergers', 0.5461999177932739),
 ('subtype', 0.5456984639167786)]

In [68]:
adhd_model.wv.most_similar("hyperactive", topn=15) ##Some interesting user associations with the word hyperactive. Wouldn't typicall yassociate "introverted and shy" with hyperactive

[('inattentive', 0.6334655284881592),
 ('disruptive', 0.6261882781982422),
 ('hyperactivity', 0.6159667372703552),
 ('fidgety', 0.6124590039253235),
 ('talkative', 0.6007491946220398),
 ('shy', 0.6003729104995728),
 ('introverted', 0.5985641479492188),
 ('dreamy', 0.5873985886573792),
 ('spacey', 0.5812333822250366),
 ('autistic', 0.580130934715271),
 ('violent', 0.5784462690353394),
 ('reserved', 0.563220202922821),
 ('stereotypical', 0.5618754029273987),
 ('chatty', 0.5561536550521851),
 ('outgoing', 0.5456730723381042)]

In [69]:
adhd_model.wv.most_similar("impulsive", topn=15) ##Again seems to be a broad array of associations. Impulsive is associated with being excitable and avoidant?

[('reckless', 0.6715306639671326),
 ('clumsy', 0.6339348554611206),
 ('forgetful', 0.6036108136177063),
 ('aggressive', 0.603134036064148),
 ('talkative', 0.5978946685791016),
 ('disorganised', 0.5937029719352722),
 ('excitable', 0.590530276298523),
 ('spacey', 0.5795295834541321),
 ('fidgety', 0.5661801695823669),
 ('risky', 0.5647457838058472),
 ('reactive', 0.5599632263183594),
 ('impatient', 0.5590059757232666),
 ('immature', 0.5548710227012634),
 ('scatterbrained', 0.5531717538833618),
 ('childish', 0.5530824661254883)]

### And now to test out some of the specific ADHD symptoms, at least the easy ones

In [70]:
adhd_model.wv.most_similar("careless", topn=10) ##This one seems very similar to each other, but with a lot of words that have negative connotations

[('“careless”', 0.6042772531509399),
 ('sloppy', 0.5900443196296692),
 ('forgetful', 0.5794960856437683),
 ('clumsy', 0.5588237047195435),
 ('errors', 0.5586423873901367),
 ('mistakes', 0.557075023651123),
 ('dumb', 0.5409373044967651),
 ('irresponsible', 0.5286861062049866),
 ('silly', 0.5278342366218567),
 ('stupid', 0.5268055200576782)]

In [71]:
adhd_model.wv.most_similar("organize", topn=15) ##Seems like organizing fits into a variety of actionable verbs.

[('organise', 0.8177522420883179),
 ('prioritize', 0.6396051645278931),
 ('execute', 0.5879781246185303),
 ('gather', 0.572615921497345),
 ('retrieve', 0.5601252317428589),
 ('memorize', 0.5534650087356567),
 ('implement', 0.5531120300292969),
 ('tackle', 0.5529537796974182),
 ('manage', 0.550489068031311),
 ('collect', 0.5456726551055908),
 ('visualize', 0.5372793078422546),
 ('arrange', 0.5336962342262268),
 ('articulate', 0.5332508087158203),
 ('utilize', 0.5240079164505005),
 ('commit', 0.5232197642326355)]

In [72]:
adhd_model.wv.most_similar("talkative", topn=15) ##This one is not very useful. The symptom about struggling to listen does not match well to this doc.

[('chatty', 0.820380687713623),
 ('introverted', 0.7916626334190369),
 ('outgoing', 0.7682960033416748),
 ('energetic', 0.7650719881057739),
 ('reserved', 0.744716227054596),
 ('shy', 0.7445093393325806),
 ('extroverted', 0.7416922450065613),
 ('clumsy', 0.7329865097999573),
 ('spacey', 0.7289618253707886),
 ('fidgety', 0.721991777420044),
 ('sociable', 0.7191054821014404),
 ('opinionated', 0.6932777166366577),
 ('empathetic', 0.6843442916870117),
 ('polite', 0.6789767146110535),
 ('disruptive', 0.6657646298408508)]

In [73]:
adhd_model.wv.most_similar("restless", topn=15) ##Logical similarities.

[('jittery', 0.7138804793357849),
 ('fidgety', 0.6894580721855164),
 ('agitated', 0.6757946610450745),
 ('fatigued', 0.6734806895256042),
 ('lethargic', 0.6696224212646484),
 ('irritable', 0.645696759223938),
 ('antsy', 0.6269670724868774),
 ('sleepy', 0.623403012752533),
 ('dizzy', 0.62251216173172),
 ('tense', 0.6199596524238586),
 ('anxious', 0.6144980788230896),
 ('foggy', 0.600687563419342),
 ('nauseous', 0.5971139669418335),
 ('sluggish', 0.5931520462036133),
 ('moody', 0.5616255402565002)]

In [74]:
adhd_model.wv.most_similar("seated", topn=10) ##The remaining seated symptom seems to struggle with strong associations, but the inclusion of "sane" and "engaged" seem relevant.

[('remaining', 0.5390201210975647),
 ('seat', 0.47399428486824036),
 ('sane', 0.4624331593513489),
 ('engaged', 0.45221009850502014),
 ('awake', 0.444064199924469),
 ('unwinding', 0.42575544118881226),
 ('hydrated', 0.41824039816856384),
 ('concentrated', 0.412509024143219),
 ('passenger', 0.40965646505355835),
 ('quiet', 0.40387728810310364)]

### These very preliminary results do indicate several interesting findings. Several symptoms have some word associations that would be considered unexpected. One wouldn't typically associate hyperactive with being shy or impulsiveness with being avoidant. Some next steps would be to refine the search terms used with the api for data collection. Further work then includes taking a more finalized set of search terms and putting those results into a table to see if there are any results that occur across most similar terms. From there I can use some LDA models from common most similar words to build a larger lexicon of words per symptom as well as uncover other groupings of symptoms.

In [75]:
adhd_model.wv.most_similar("focus", topn=15)

[('concentrate', 0.8481261730194092),
 ('focused', 0.6091128587722778),
 ('hyper-focus', 0.6025528907775879),
 ('multitask', 0.5992745757102966),
 ('focusing', 0.5961366891860962),
 ('grasp', 0.5429612398147583),
 ('fixate', 0.5339042544364929),
 ('concentration', 0.5309540629386902),
 ('hyperfocus', 0.5192564129829407),
 ('perform', 0.4820605218410492),
 ('fixating', 0.481469064950943),
 ('prioritize', 0.48124343156814575),
 ('handle', 0.47508662939071655),
 ('relax', 0.4741531014442444),
 ('focussing', 0.4717353284358978)]

In [76]:
adhd_model.wv.most_similar("memory", topn=15)

[('concentration', 0.5320252776145935),
 ('handwriting', 0.5230250954627991),
 ('spatial', 0.48069703578948975),
 ('organisation', 0.4729907214641571),
 ('span', 0.4508339464664459),
 ('temper', 0.44150879979133606),
 ('forgetfulness', 0.42994481325149536),
 ('impulsiveness', 0.41667038202285767),
 ('retention', 0.4121578335762024),
 ('mood', 0.4120362401008606),
 ('organization', 0.4104546010494232),
 ('insomnia', 0.40677115321159363),
 ('listener', 0.4019554853439331),
 ('memory**', 0.40187662839889526),
 ('auditory', 0.40172940492630005)]

In [77]:
adhd_model.wv.most_similar("difficulty", topn=15)

[('trouble', 0.7610747218132019),
 ('difficulties', 0.6497974395751953),
 ('troubles', 0.6218414902687073),
 ('**trouble', 0.511652946472168),
 ('**problems', 0.498218834400177),
 ('**difficulty', 0.48839282989501953),
 ('problems', 0.45324623584747314),
 ('sustaining', 0.44685259461402893),
 ('hyperactivity**', 0.435110479593277),
 ('assists', 0.42267823219299316),
 ('maintaining', 0.42101895809173584),
 ('issues', 0.41539111733436584),
 ('leisurely', 0.4144740700721741),
 ('***problem', 0.41301286220550537),
 ('struggled', 0.4124029576778412)]

In [78]:
adhd_model.wv.most_similar("scatterbrained", topn=15)

[('spacey', 0.7735702991485596),
 ('clumsy', 0.7679845094680786),
 ('disorganised', 0.7478817105293274),
 ('absent-minded', 0.7193900346755981),
 ('unorganised', 0.7123687267303467),
 ('irresponsible', 0.7070583701133728),
 ('forgetful', 0.7070427536964417),
 ('unfocused', 0.7020904421806335),
 ('unorganized', 0.70011967420578),
 ('disorganized', 0.6838431358337402),
 ('lethargic', 0.68073570728302),
 ('immature', 0.6779195666313171),
 ('argumentative', 0.6622096300125122),
 ('moody', 0.6595907807350159),
 ('ditzy', 0.6540588140487671)]

In [79]:
adhd_model.wv.most_similar("dysregulation", topn=15)

[('regulation', 0.8334793448448181),
 ('disregulation', 0.8171661496162415),
 ('deregulation', 0.7275749444961548),
 ('reactivity', 0.7013592720031738),
 ('hypersensitivity', 0.6800256967544556),
 ('blunting', 0.677998423576355),
 ('outbursts', 0.6621201038360596),
 ('instability', 0.6581079959869385),
 ('flatness', 0.6484003067016602),
 ('self-regulation', 0.6396342515945435),
 ('disturbances', 0.630136251449585),
 ('impatience', 0.6208693385124207),
 ('lability', 0.6159254908561707),
 ('impulsiveness', 0.6140985488891602),
 ('explosions', 0.6006250977516174)]

In [80]:
adhd_model.wv.most_similar("distracted", topn=15)

[('sidetracked', 0.6984801292419434),
 ('bored', 0.6892499327659607),
 ('irritated', 0.6187742948532104),
 ('bored/distracted', 0.6156244874000549),
 ('overwhelmed', 0.6077423691749573),
 ('agitated', 0.5751602053642273),
 ('derailed', 0.548134446144104),
 ('frustrated', 0.5477302074432373),
 ('antsy', 0.5350170135498047),
 ('aggravated', 0.5285260677337646),
 ('impatient', 0.512834370136261),
 ('overstimulated', 0.5104950666427612),
 ('annoyed', 0.5102289915084839),
 ('startled', 0.5071300864219666),
 ('angered', 0.5055261850357056)]

In [81]:
adhd_model.wv.most_similar("overstimulation", topn=15)

[('dissociation', 0.6238409280776978),
 ('boredom', 0.6151790618896484),
 ('rumination', 0.6081263422966003),
 ('impatience', 0.6009877920150757),
 ('fatigue', 0.5999435782432556),
 ('tiredness', 0.5950149297714233),
 ('understimulation', 0.5889251232147217),
 ('exhaustion', 0.5880143046379089),
 ('unexplained', 0.579112708568573),
 ('tension', 0.575767993927002),
 ('discomfort', 0.5742940902709961),
 ('confusion', 0.5724020600318909),
 ('drowsiness', 0.5712447762489319),
 ('agitation', 0.5700607895851135),
 ('indecision', 0.5674419403076172)]

In [82]:
adhd_model.wv.most_similar("understimulation", topn=15)

[('burn-out', 0.6064639687538147),
 ('dehydration', 0.5957694053649902),
 ('overstimulation', 0.5889251828193665),
 ('disruptions', 0.584206223487854),
 ('rumination', 0.5747585892677307),
 ('agitation', 0.5697938203811646),
 ('tinnitus', 0.566262423992157),
 ('grief', 0.5649697184562683),
 ('irritation', 0.5619874596595764),
 ('dissociation', 0.5610262751579285),
 ('distress', 0.5594928860664368),
 ('arthritis', 0.5556885600090027),
 ('drowsiness', 0.5515419840812683),
 ('tmj', 0.5513576865196228),
 ('misérables*', 0.5492135882377625)]

In [83]:
adhd_model.wv.most_similar("anxiety", topn=15)

[('insomnia', 0.6081119775772095),
 ('ptsd', 0.6028245687484741),
 ('depression/anxiety', 0.5966824293136597),
 ('gad', 0.5930671095848083),
 ('ocd', 0.5719979405403137),
 ('anxiousness', 0.5653916597366333),
 ('anxiety/panic', 0.5633661150932312),
 ('fatigue', 0.5539025664329529),
 ('depression', 0.5430821776390076),
 ('migraines', 0.5427714586257935),
 ('phobia', 0.5392332673072815),
 ('mania', 0.5351836085319519),
 ('anxiety/depression', 0.5235435366630554),
 ('panic', 0.5218998193740845),
 ('rsd', 0.5110689401626587)]

In [84]:
adhd_model.wv.most_similar("depression", topn=15)

[('ptsd', 0.6753057837486267),
 ('ocd', 0.6622640490531921),
 ('depression/anxiety', 0.6395909786224365),
 ('gad', 0.6238530874252319),
 ('bpd', 0.6113967895507812),
 ('mdd', 0.6047233939170837),
 ('insomnia', 0.5990356206893921),
 ('depressive', 0.5762543082237244),
 ('narcolepsy', 0.5686822533607483),
 ('generalized', 0.5655974745750427),
 ('anxiety', 0.5430821776390076),
 ('anxiety/depression', 0.5371780395507812),
 ('bipolar', 0.5371232628822327),
 ('dysthymia', 0.5363638401031494),
 ('cptsd', 0.534454345703125)]

In [85]:
adhd_model.wv.most_similar("sleep", topn=15)

[('sleeping', 0.5822163820266724),
 ('asleep', 0.537973940372467),
 ('bed', 0.5157887935638428),
 ('exercise', 0.4770318567752838),
 ('sleepless', 0.42628923058509827),
 ('eat', 0.4252566695213318),
 ('appetite', 0.41989952325820923),
 ('wake', 0.4183107316493988),
 ('eating', 0.4168846905231476),
 ('drink', 0.41547060012817383),
 ('function', 0.4149003326892853),
 ('meditate', 0.40719714760780334),
 ('headaches', 0.40476006269454956),
 ('diet', 0.403209924697876),
 ('relax', 0.40277156233787537)]

In [86]:
adhd_model.wv.most_similar("tasks", topn=15)

[('task', 0.7064698338508606),
 ('chores', 0.6773264408111572),
 ('things', 0.6650495529174805),
 ('projects', 0.6244637370109558),
 ('activities', 0.5851579308509827),
 ('items', 0.5803000926971436),
 ('assignments', 0.5648313760757446),
 ('schoolwork', 0.5349326133728027),
 ('responsibilities', 0.4776991903781891),
 ('routines', 0.4760463535785675),
 ('worksheets', 0.47448819875717163),
 ('stuff', 0.4727061092853546),
 ('goals', 0.46987056732177734),
 ('things**', 0.4682238698005676),
 ('steps', 0.4666711390018463)]

In [87]:
adhd_model.wv.most_similar("blurts", topn=15)

[('blurted', 0.7502467632293701),
 ('blurting', 0.717305600643158),
 ('-spacing', 0.7110857963562012),
 ('lashes', 0.7055999636650085),
 ('blurt', 0.6727300882339478),
 ('tuning', 0.6687672138214111),
 ('fizzle', 0.665898323059082),
 ('blacking', 0.6641537547111511),
 ('blacked', 0.6615590453147888),
 ('lashing', 0.6576511263847351),
 ('aired', 0.6561911106109619),
 ('maxed', 0.6533636450767517),
 ('spit', 0.6533453464508057),
 ('chickened', 0.6454461216926575),
 ('spurt', 0.6394838690757751)]

In [88]:
adhd_model.wv.most_similar("careless", topn=15)

[('“careless”', 0.6042772531509399),
 ('sloppy', 0.5900443196296692),
 ('forgetful', 0.5794960856437683),
 ('clumsy', 0.5588237047195435),
 ('errors', 0.5586423873901367),
 ('mistakes', 0.557075023651123),
 ('dumb', 0.5409373044967651),
 ('irresponsible', 0.5286861062049866),
 ('silly', 0.5278342366218567),
 ('stupid', 0.5268055200576782),
 ('grammatical', 0.5169370174407959),
 ('mistakes/lacks', 0.5039103627204895),
 ('spacey', 0.49755796790122986),
 ('silliest', 0.49596911668777466),
 ('disorganized', 0.4902001619338989)]

In [89]:
adhd_model.wv.most_similar("distracted", topn=15)

[('sidetracked', 0.6984801292419434),
 ('bored', 0.6892499327659607),
 ('irritated', 0.6187742948532104),
 ('bored/distracted', 0.6156244874000549),
 ('overwhelmed', 0.6077423691749573),
 ('agitated', 0.5751602053642273),
 ('derailed', 0.548134446144104),
 ('frustrated', 0.5477302074432373),
 ('antsy', 0.5350170135498047),
 ('aggravated', 0.5285260677337646),
 ('impatient', 0.512834370136261),
 ('overstimulated', 0.5104950666427612),
 ('annoyed', 0.5102289915084839),
 ('startled', 0.5071300864219666),
 ('angered', 0.5055261850357056)]

In [90]:
adhd_model.wv.most_similar("fidget", topn=15)

[('stim', 0.6484668254852295),
 ('fidgeting', 0.5641934871673584),
 ('fiddle', 0.5423783659934998),
 ('fidgets', 0.5331591367721558),
 ('daydream', 0.530850887298584),
 ('fiddling', 0.5207068920135498),
 ('tap', 0.5135927200317383),
 ('doodle', 0.5035949349403381),
 ('whistle', 0.49502500891685486),
 ('tapping', 0.4946359097957611),
 ('pens', 0.4945749342441559),
 ('rubix', 0.49444466829299927),
 ('rubik’s', 0.4745483100414276),
 ('pen', 0.46283578872680664),
 ('legs', 0.46209532022476196)]

In [91]:
adhd_model.wv.most_similar("forgetful", topn=15)

[('disorganised', 0.7627739310264587),
 ('disorganized', 0.7618280053138733),
 ('clumsy', 0.7485203146934509),
 ('spacey', 0.7073506116867065),
 ('scatterbrained', 0.7070428133010864),
 ('unorganized', 0.698992908000946),
 ('unorganised', 0.6882550120353699),
 ('absent-minded', 0.6710038185119629),
 ('unfocused', 0.6527332663536072),
 ('irresponsible', 0.6281096339225769),
 ('fidgety', 0.6246336698532104),
 ('talkative', 0.6128419041633606),
 ('argumentative', 0.608806848526001),
 ('impulsive', 0.6036108136177063),
 ('moody', 0.592080295085907)]

In [92]:
adhd_model.wv.most_similar("instructions", topn=15)

[('directions', 0.7349821925163269),
 ('instruction', 0.5683904886245728),
 ('verbal', 0.49578818678855896),
 ('instructions”**', 0.4955167770385742),
 ('numbers', 0.4606901705265045),
 ('recipes', 0.45985323190689087),
 ('instructions*', 0.44155654311180115),
 ('orders', 0.4275442659854889),
 ('sentences', 0.4273284077644348),
 ('details', 0.4231005311012268),
 ('protocol', 0.41872793436050415),
 ('carefully', 0.41827085614204407),
 ('rules', 0.41671115159988403),
 ('texts', 0.41591373085975647),
 ('tasks', 0.4153897166252136)]

In [93]:
adhd_model.wv.most_similar("interrupts", topn=15)

[('intrudes', 0.7125340104103088),
 ('interrupt', 0.6045215725898743),
 ('speaks', 0.6014426946640015),
 ('avoids', 0.5826585292816162),
 ('interrupting', 0.5731493234634399),
 ('looses', 0.5521567463874817),
 ('ignores', 0.5521173477172852),
 ('hears', 0.5465902090072632),
 ('tells', 0.5388582348823547),
 ('talks', 0.5344032645225525),
 ('fidgets', 0.5326068997383118),
 ('watches', 0.5277583003044128),
 ('distracts', 0.5258309841156006),
 ('compares', 0.5161986351013184),
 ('asks', 0.5160029530525208)]

In [94]:
adhd_model.wv.most_similar("listen", topn=15)

[('listening', 0.6309062242507935),
 ('listened', 0.6072753071784973),
 ('hear', 0.5604420900344849),
 ('respond', 0.5445165038108826),
 ('connect', 0.5430163741111755),
 ('interrupt', 0.5172647833824158),
 ('speak', 0.513073742389679),
 ('talk', 0.5067105889320374),
 ('ignore', 0.5038173198699951),
 ('concentrate', 0.5020611882209778),
 ('sing', 0.5016178488731384),
 ('engage', 0.4970810115337372),
 ('retain', 0.482991099357605),
 ('comprehend', 0.47226476669311523),
 ('understand', 0.4700518250465393)]

In [95]:
adhd_model.wv.most_similar("misplaces", topn=15)

[('misplacing', 0.72845858335495),
 ('misplace', 0.68247389793396),
 ('looses', 0.6740795969963074),
 ('forgets', 0.6724807620048523),
 ('aaid', 0.6218665242195129),
 ('loses', 0.6211382746696472),
 ('misplacing/losing', 0.6071367859840393),
 ('losing/forgetting', 0.5791614055633545),
 ('***loses', 0.5569247007369995),
 ('organises', 0.5467885136604309),
 ('hoard', 0.5422442555427551),
 ('forget/lose', 0.5261827111244202),
 ('overreact', 0.5240338444709778),
 ('**clicked**', 0.519274115562439),
 ('complains', 0.5121062994003296)]

In [96]:
adhd_model.wv.most_similar("moves", topn=15)

[('jumps', 0.5691097974777222),
 ('bounces', 0.5261719822883606),
 ('goes', 0.48491916060447693),
 ('leaves', 0.46128982305526733),
 ('carries', 0.45502084493637085),
 ('starts', 0.4479454755783081),
 ('switches', 0.4445071518421173),
 ('pulls', 0.44132769107818604),
 ('enters', 0.43851253390312195),
 ('move', 0.4318757653236389),
 ('drifts', 0.4316878318786621),
 ('moving', 0.4308895170688629),
 ('speaks', 0.43052104115486145),
 ('latches', 0.42929506301879883),
 ('“foggy”', 0.4280104637145996)]

In [97]:
adhd_model.wv.most_similar("restless", topn=15)

[('jittery', 0.7138804793357849),
 ('fidgety', 0.6894580721855164),
 ('agitated', 0.6757946610450745),
 ('fatigued', 0.6734806895256042),
 ('lethargic', 0.6696224212646484),
 ('irritable', 0.645696759223938),
 ('antsy', 0.6269670724868774),
 ('sleepy', 0.623403012752533),
 ('dizzy', 0.62251216173172),
 ('tense', 0.6199596524238586),
 ('anxious', 0.6144980788230896),
 ('foggy', 0.600687563419342),
 ('nauseous', 0.5971139669418335),
 ('sluggish', 0.5931520462036133),
 ('moody', 0.5616255402565002)]

In [98]:
adhd_model.wv.most_similar("organize", topn=15)

[('organise', 0.8177522420883179),
 ('prioritize', 0.6396051645278931),
 ('execute', 0.5879781246185303),
 ('gather', 0.572615921497345),
 ('retrieve', 0.5601252317428589),
 ('memorize', 0.5534650087356567),
 ('implement', 0.5531120300292969),
 ('tackle', 0.5529537796974182),
 ('manage', 0.550489068031311),
 ('collect', 0.5456726551055908),
 ('visualize', 0.5372793078422546),
 ('arrange', 0.5336962342262268),
 ('articulate', 0.5332508087158203),
 ('utilize', 0.5240079164505005),
 ('commit', 0.5232197642326355)]

In [99]:
adhd_model.wv.most_similar("concentration", topn=15)

[('productivity', 0.5897951722145081),
 ('attentiveness', 0.5828397274017334),
 ('retention', 0.5812674760818481),
 ('organisation', 0.577720582485199),
 ('motivation', 0.5686666369438171),
 ('self-confidence', 0.5524927973747253),
 ('follow-through', 0.5472126007080078),
 ('organization', 0.5367908477783203),
 ('impulsivity', 0.5333048701286316),
 ('memory', 0.5320252180099487),
 ('focus', 0.5309539437294006),
 ('impulsiveness', 0.5192360281944275),
 ('self-control', 0.5099266171455383),
 ('appetite', 0.5020779371261597),
 ('libido', 0.5010485053062439)]

In [100]:
adhd_model.wv.most_similar("quiet", topn=15)

[('silent', 0.6159137487411499),
 ('talkative', 0.5776931643486023),
 ('shy', 0.5582040548324585),
 ('introverted', 0.5527591109275818),
 ('polite', 0.5510299801826477),
 ('reserved', 0.5425533652305603),
 ('attentive', 0.5326888561248779),
 ('calm', 0.5306251645088196),
 ('noisy', 0.5257624387741089),
 ('relaxed', 0.5058287382125854),
 ('fidgety', 0.5013799667358398),
 ('busy', 0.4956093430519104),
 ('disruptive', 0.49334895610809326),
 ('active', 0.49274972081184387),
 ('outgoing', 0.48506227135658264)]

In [101]:
adhd_model.wv.most_similar("seated", topn=15)

[('remaining', 0.5390201210975647),
 ('seat', 0.47399428486824036),
 ('sane', 0.4624331593513489),
 ('engaged', 0.45221009850502014),
 ('awake', 0.444064199924469),
 ('unwinding', 0.42575544118881226),
 ('hydrated', 0.41824039816856384),
 ('concentrated', 0.412509024143219),
 ('passenger', 0.40965646505355835),
 ('quiet', 0.40387728810310364),
 ('alert', 0.38515833020210266),
 ('afloat', 0.37947407364845276),
 ('expected*', 0.37441834807395935),
 ('indoors', 0.37200212478637695),
 ('stimulated', 0.3626222312450409)]

In [102]:
adhd_model.wv.most_similar("talks", topn=15)

[('complains', 0.7013084292411804),
 ('speaks', 0.6362513303756714),
 ('joked', 0.6163971424102783),
 ('complained', 0.6011800169944763),
 ('spoke', 0.5905921459197998),
 ('talk', 0.5868416428565979),
 ('talked', 0.5865479707717896),
 ('listens', 0.5811729431152344),
 ('cares', 0.5776601433753967),
 ('sees', 0.5538138747215271),
 ('notices', 0.5536605715751648),
 ('forgets', 0.5532332062721252),
 ('enjoys', 0.5524865388870239),
 ('cries', 0.5524336695671082),
 ('talking', 0.5461887121200562)]

In [103]:
adhd_model.wv.most_similar("waiting", topn=15)

[('wait', 0.5577406883239746),
 ('awaiting', 0.5198859572410583),
 ('searching', 0.46206167340278625),
 ('waited', 0.43635067343711853),
 ('fishing', 0.4176577627658844),
 ('compiling', 0.4144503176212311),
 ('preparing', 0.4030948281288147),
 ('waitlist', 0.3866419494152069),
 ('queue', 0.3772072494029999),
 ('£300', 0.3707655072212219),
 ('scheduled', 0.3698205053806305),
 ('22nd', 0.357849657535553),
 ('riding', 0.3483898341655731),
 ('lurking', 0.3471589982509613),
 ('going', 0.34461697936058044)]

In [None]:
adhd_model.wv.most_similar("waiting", topn=15)

In [104]:
adhd_model.wv.most_similar("organize", topn=20)

[('organise', 0.8177522420883179),
 ('prioritize', 0.6396051645278931),
 ('execute', 0.5879781246185303),
 ('gather', 0.572615921497345),
 ('retrieve', 0.5601252317428589),
 ('memorize', 0.5534650087356567),
 ('implement', 0.5531120300292969),
 ('tackle', 0.5529537796974182),
 ('manage', 0.550489068031311),
 ('collect', 0.5456726551055908),
 ('visualize', 0.5372793078422546),
 ('arrange', 0.5336962342262268),
 ('articulate', 0.5332508087158203),
 ('utilize', 0.5240079164505005),
 ('commit', 0.5232197642326355),
 ('motivate', 0.5192403197288513),
 ('create', 0.5161908268928528),
 ('accomplish', 0.515205442905426),
 ('simplify', 0.5114571452140808),
 ('organizing', 0.5077037811279297)]

In [119]:
adhd_model.wv.most_similar("prioritize", topn=20)

[('organize', 0.6396051645278931),
 ('execute', 0.634045422077179),
 ('accomplish', 0.6207945346832275),
 ('tackle', 0.6202678084373474),
 ('organise', 0.5969576239585876),
 ('perform', 0.577804684638977),
 ('multitask', 0.5767335295677185),
 ('commit', 0.5645694136619568),
 ('initiate', 0.5412926077842712),
 ('concentrate', 0.5382057428359985),
 ('visualize', 0.5352488160133362),
 ('prioritizing', 0.5348851680755615),
 ('prioritise', 0.5323051810264587),
 ('regulate', 0.5275587439537048),
 ('achieve', 0.5253454446792603),
 ('memorize', 0.5237484574317932),
 ('juggle', 0.5216630101203918),
 ('implement', 0.5195894837379456),
 ('delegate', 0.5174523591995239),
 ('complete', 0.509374737739563)]

In [106]:
adhd_model.wv.most_similar("focus", topn=20)

[('concentrate', 0.8481261730194092),
 ('focused', 0.6091128587722778),
 ('hyper-focus', 0.6025528907775879),
 ('multitask', 0.5992745757102966),
 ('focusing', 0.5961366891860962),
 ('grasp', 0.5429612398147583),
 ('fixate', 0.5339042544364929),
 ('concentration', 0.5309540629386902),
 ('hyperfocus', 0.5192564129829407),
 ('perform', 0.4820605218410492),
 ('fixating', 0.481469064950943),
 ('prioritize', 0.48124343156814575),
 ('handle', 0.47508662939071655),
 ('relax', 0.4741531014442444),
 ('focussing', 0.4717353284358978),
 ('concentrating', 0.462649405002594),
 ('rely', 0.4593639373779297),
 ('comprehend', 0.4556039869785309),
 ('sustain', 0.4512845575809479),
 ('hyperfixate', 0.44925710558891296)]

In [107]:
adhd_model.wv.most_similar("initiate", topn=20)

[('prioritize', 0.5412926077842712),
 ('execute', 0.5237286686897278),
 ('initiating', 0.5086759924888611),
 ('executing', 0.4979875981807709),
 ('engage', 0.4696354866027832),
 ('maintain', 0.46862679719924927),
 ('contribute', 0.4610580503940582),
 ('anticipate', 0.4579991400241852),
 ('tackle', 0.45350775122642517),
 ('avoid', 0.4466940760612488),
 ('postpone', 0.4412894546985626),
 ('requiring', 0.4404865801334381),
 ('implement', 0.438987135887146),
 ('assign', 0.4372934401035309),
 ('commit', 0.43635544180870056),
 ('organising', 0.4350680708885193),
 ('involve', 0.4341478645801544),
 ('delegate', 0.4319782257080078),
 ('ignore', 0.43055030703544617),
 ('juggle', 0.4277782142162323)]

In [109]:
adhd_model.wv.most_similar("follow-through", topn=20)

[('prioritization', 0.6340122818946838),
 ('prioritizing', 0.6057565808296204),
 ('prioritisation', 0.6024304032325745),
 ('organisation', 0.5664590001106262),
 ('punctuality', 0.5568532347679138),
 ('concentration', 0.547212541103363),
 ('organization', 0.5437710881233215),
 ('avoids/dislikes', 0.5376207828521729),
 ('short-term', 0.5371763110160828),
 ('initiation', 0.5340088605880737),
 ('prioritising', 0.5322688817977905),
 ('coordination', 0.5313054323196411),
 ('disorganisation', 0.530411422252655),
 ('articulation', 0.5224964618682861),
 ('dependability', 0.512200117111206),
 ('procrastination', 0.5075905323028564),
 ('disorganization', 0.5075584650039673),
 ('self-discipline', 0.5022214651107788),
 ('sticky-notes', 0.49965935945510864),
 ('persistence', 0.49525710940361023)]

In [121]:
adhd_model.wv.most_similar("remember", topn=20)

[('recall', 0.7709953188896179),
 ('comprehend', 0.5800585150718689),
 ('fathom', 0.5454567670822144),
 ('imagine', 0.5428937673568726),
 ('memorize', 0.49510109424591064),
 ('pinpoint', 0.4947699308395386),
 ('forget', 0.4742870628833771),
 ('remeber', 0.4687615633010864),
 ('retain', 0.4676784873008728),
 ('concentrate', 0.462189644575119),
 ('remembered', 0.4572097957134247),
 ('multitask', 0.4543330669403076),
 ('understand', 0.4529891908168793),
 ('summarize', 0.4504868686199188),
 ('absorb', 0.4489503800868988),
 ('predict', 0.44815006852149963),
 ('accomplish', 0.4414917528629303),
 ('register', 0.4393739700317383),
 ('describe', 0.4330478012561798),
 ('locate', 0.43122756481170654)]

In [112]:
adhd_model.wv.most_similar("forget", topn=20)

[('forgot', 0.6664561629295349),
 ('forgetting', 0.6448471546173096),
 ('miss', 0.6135790944099426),
 ('misplace', 0.5845202207565308),
 ('forgotten', 0.5579928159713745),
 ('lose', 0.5197671055793762),
 ('forgets', 0.5192765593528748),
 ('remembered', 0.5059275031089783),
 ('procrastinate', 0.49983686208724976),
 ('overthink', 0.4931754171848297),
 ('remember', 0.4742870330810547),
 ('remembering', 0.4496007263660431),
 ('misplacing', 0.4304111897945404),
 ('ignore', 0.4261144697666168),
 ('loose', 0.41650789976119995),
 ('misread', 0.4155377447605133),
 ('leave', 0.407657265663147),
 ('reread', 0.4022226333618164),
 ('misplaced', 0.40076491236686707),
 ('daydream', 0.3936951160430908)]

In [113]:
adhd_model.wv.most_similar("emotional", topn=20)

[('anger', 0.6036617159843445),
 ('emotion', 0.5815500617027283),
 ('intense', 0.5339415669441223),
 ('aggression', 0.5328584909439087),
 ('emotions', 0.5317428112030029),
 ('extreme', 0.5274925827980042),
 ('irritability', 0.5271798372268677),
 ('insecurity', 0.5182892680168152),
 ('frustration', 0.5152567028999329),
 ('rejection', 0.5093329548835754),
 ('impulsiveness', 0.5083732008934021),
 ('explosive', 0.5076797604560852),
 ('impatience', 0.5038495063781738),
 ('impulsivity', 0.500531017780304),
 ('violent', 0.500497579574585),
 ('empathy', 0.48734813928604126),
 ('uncontrollable', 0.4865398406982422),
 ('internal', 0.4751785099506378),
 ('sensory', 0.4704394042491913),
 ('restlessness', 0.4690715968608856)]

In [122]:
adhd_model.wv.most_similar("mood", topn=20)

[('moods', 0.6146122813224792),
 ('irritability', 0.5238090753555298),
 ('appetite', 0.4935661256313324),
 ('insomnia', 0.4660658538341522),
 ('nausea', 0.4586219787597656),
 ('headaches', 0.45429497957229614),
 ('self-esteem', 0.45314645767211914),
 ('concentration', 0.44794073700904846),
 ('diet', 0.4434126615524292),
 ('migraines', 0.44115638732910156),
 ('fatigue', 0.4401665925979614),
 ('stomach', 0.43692874908447266),
 ('paranoia', 0.43351972103118896),
 ('libido', 0.42743536829948425),
 ('listener', 0.423074871301651),
 ('rhythm', 0.42078807950019836),
 ('lows', 0.4190419912338257),
 ('attentiveness', 0.4188489317893982),
 ('headache', 0.41810470819473267),
 ('tiredness', 0.4178123474121094)]

In [115]:
adhd_model.wv.most_similar("irritable", topn=20)

[('agitated', 0.787293553352356),
 ('jittery', 0.7852428555488586),
 ('lethargic', 0.7586954236030579),
 ('flustered', 0.7448184490203857),
 ('moody', 0.7401297688484192),
 ('anxious', 0.7252708673477173),
 ('sleepy', 0.7244883179664612),
 ('nauseous', 0.7173300385475159),
 ('irritated', 0.7143795490264893),
 ('angry', 0.6804870367050171),
 ('overstimulated', 0.678596556186676),
 ('dizzy', 0.6768774390220642),
 ('antsy', 0.6655011177062988),
 ('grumpy', 0.6620075106620789),
 ('impatient', 0.6607453227043152),
 ('fidgety', 0.6564841866493225),
 ('defensive', 0.6544305086135864),
 ('depressed', 0.6536476612091064),
 ('fatigued', 0.64913409948349),
 ('restless', 0.6456966996192932)]

In [118]:
adhd_model.wv.most_similar("daydream", topn=20)

[('daydreaming', 0.6468665599822998),
 ('zone', 0.6026422381401062),
 ('doodle', 0.6003548502922058),
 ('daydreamed', 0.5714013576507568),
 ('procrastinate', 0.551242470741272),
 ('fidget', 0.530850887298584),
 ('stutter', 0.5236532092094421),
 ('overthink', 0.4938117563724518),
 ('tap', 0.4937859773635864),
 ('ramble', 0.48540639877319336),
 ('fidgeting', 0.4802742600440979),
 ('interrupt', 0.4799862802028656),
 ('procastinate', 0.47386401891708374),
 ('doodling', 0.4692108929157257),
 ('drift', 0.46743130683898926),
 ('stim', 0.465459406375885),
 ('overshare', 0.4645114839076996),
 ('zoning', 0.4634295701980591),
 ('fidgety', 0.45977863669395447),
 ('wander', 0.45605629682540894)]

In [125]:
adhd_model.wv.most_similar("regulate", topn=20)

[('sustain', 0.6455444693565369),
 ('reduce', 0.626389741897583),
 ('express', 0.6256769895553589),
 ('suppress', 0.5975077748298645),
 ('inhibit', 0.5852774977684021),
 ('contain', 0.5721020102500916),
 ('acquire', 0.571570098400116),
 ('mitigate', 0.5679916739463806),
 ('strengthen', 0.5678445100784302),
 ('tame', 0.5632448196411133),
 ('minimize', 0.5629931688308716),
 ('alleviate', 0.5612228512763977),
 ('satisfy', 0.5578768849372864),
 ('articulate', 0.5574641823768616),
 ('improve', 0.5559655427932739),
 ('overcome', 0.5559239983558655),
 ('redirect', 0.5545377135276794),
 ('soothe', 0.5531738996505737),
 ('organise', 0.5513549447059631),
 ('maintain', 0.5494052767753601)]

In [127]:
adhd_model.wv.most_similar("manage", topn=20)

[('cope', 0.5642402768135071),
 ('organize', 0.5504890084266663),
 ('organise', 0.5380420088768005),
 ('navigate', 0.5341765284538269),
 ('adapt', 0.5329495668411255),
 ('overcome', 0.530820906162262),
 ('handle', 0.5278725624084473),
 ('improve', 0.5140276551246643),
 ('survive', 0.5016991496086121),
 ('regulate', 0.5002877712249756),
 ('prioritize', 0.4985707104206085),
 ('tackle', 0.49598684906959534),
 ('implement', 0.4886930286884308),
 ('combat', 0.48388373851776123),
 ('perform', 0.4761952757835388),
 ('motivate', 0.4742758274078369),
 ('utilize', 0.4721716344356537),
 ('maintain', 0.4676174819469452),
 ('managing', 0.46663764119148254),
 ('function', 0.4597117304801941)]

In [128]:
adhd_model.wv.most_similar("regulate", topn=20)

[('sustain', 0.6455444693565369),
 ('reduce', 0.626389741897583),
 ('express', 0.6256769895553589),
 ('suppress', 0.5975077748298645),
 ('inhibit', 0.5852774977684021),
 ('contain', 0.5721020102500916),
 ('acquire', 0.571570098400116),
 ('mitigate', 0.5679916739463806),
 ('strengthen', 0.5678445100784302),
 ('tame', 0.5632448196411133),
 ('minimize', 0.5629931688308716),
 ('alleviate', 0.5612228512763977),
 ('satisfy', 0.5578768849372864),
 ('articulate', 0.5574641823768616),
 ('improve', 0.5559655427932739),
 ('overcome', 0.5559239983558655),
 ('redirect', 0.5545377135276794),
 ('soothe', 0.5531738996505737),
 ('organise', 0.5513549447059631),
 ('maintain', 0.5494052767753601)]