Skip to content

KishoreKonakanti/DomainAnalytics

Repository files navigation

DomainAnalytics

Code to reproduce results as described in domain analytics @ https://medium.com/@enigma2006.kishore/analysis-about-ai-ml-io-domains-6a5cdac91a46"

Packages used:

Common: re, time, threading, os, csv, pandas, numpy, collection, urllib

May require a new install:

bs4(Beautiful Soup ( https://www.crummy.com/software/BeautifulSoup/bs4/doc/ )

wordcloud ( https://github.com/amueller/word_cloud)

nltk ( www.nltk.org )

langid ( https://github.com/saffsd/langid.py.git </a): An offline module to detect language

Description of script files:

  1. PullLinks.py: Loads links to a file (called as linkFile)

  2. FetchDetailsOfWebsite_HyperThreaded.py: For each link read in step 1, load required details of a website and write to a csv file

  3. WebsiteAnalysis.py: Use Pandas to read csv file and do analysis, draw graphs

To Do:

  1. Add a parameter file to read required parameters so as to avoid messing with the code
  2. Add a driver script which takes care of all the above mentioned 3 steps

About

Code for to get domain analytics as described in Medium article

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages