Skip to content

Anakeyn/TFIDFKeywordsSuggest

Repository files navigation

TF-IDF KeywordsSuggest - Version Alpha 0.1 Licence GPL 3

Version 0.1 : change tfidfkeywordsuggest.py due to change in googlesearch library . Minor bug fixed.

Anakeyn TF-IDF Keywords Suggest is a keywords suggestion tool for SEO and Web Maketing purpose. This tool searches and stores the first x pages responding to a given keyword in Google.

Next the system will get the content of the pages in order to find popular and original keywords/Expressions in the subject area. The system works with a TF-IDF algorithm.

TF-IDF means term frequency–inverse document frequency. TF-IDF is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.

In order to calculate a "global" TF-IDF value we calculate a mean of TF-IDF for each term from all documents to find popular expressions and a non-zero mean of each term from all documents for original expressions.

The program is developed in Python in a Web format using Flask (web framework), Jinja2 (web template engine), SQLALchemy (Object-relational mapping for SQL databases),Bootstrap (front-end framework) ...

STRUCTURE :

KeywordsSuggest
|   database.db
|   favicon.ico
|   tfidfkeywordssuggest.py
|   license.txt
|   myconfig.py
|   requirements.txt
|   __init__.py
|   
+---configdata
|       tldLang.xlsx
|       user_agents-taglang.txt
|       
+---static
|       Anakeyn_Rectangle.jpg
|       tfidfkeywordssuggest.css
|       Oeil_Anakeyn.jpg
|       signin.css
|       starter-template.css
|              
+---templates
|       index.html
|       tfidfkeywordssuggest.html
|       login.html
|       signup.html
|       
+---uploads

By default the system works with a SQLite database called database.db which is created the first time you use the program. The main program is "tfidfkeywordssuggest.py".

Default config variables are in the myconfig.py file including the 2 default users : admin (pwd "adminpwd") and guest (pwd "guestpwd")

Other configuration data is available in the configdata subdiretory in 2 files : tldlang.xlsx : parameters for Google Top Level domains and Search Engines Results Pages languages (358 combinations) user_agents-taglang.txt : a list of valid user agents to provide to Google randomly to avoid blocking. (4281)

Static directory contains images and .css files

Templates directory contains .html templates.

Uploads directory is dedicated to create/save all keywords files to download.

The system creates 7 "popular" keywords/expressions files : 1 file with all sizes expression in words, and one file for respectively 1, 2, 3, 4, 5 or 6 words expressions. The same for "original" keywords/expressions files. If available, the system provides a maximum of 10.000 expressions for each file. This could be enough to get ideas :-)

How to test the program on your computer :

Download the .zip file of this application https://github.com/Anakeyn/TFIDFKeywordsSuggest/archive/master.zip and unzip it in a directory on your computer.

Download and Install Anaconda https://www.anaconda.com/distribution/#download-section

Anaconda will install tools on your computer :

Anaconda-Tools

Open Anaconda Prompt and go to the directory where you installed the application previously (for example for Windows : cd c:\Users\myname\document......\

Make sure you have the file "requirements.txt" in your directory : dir (Windows) or ls (Linux)

To install Library dependencies for the python code. You need to install these with the command :

For Linux : while read requirement; do conda install --yes $requirement || pip install $requirement; done < requirements.txt

For Windows : FOR /F "delims=~" %f in (requirements.txt) DO conda install --yes "%f" || pip install "%f"

AnacondaPrompt

Next launch Spyder and open the main Python file tfidfkeywordssuggest.py

spyder-keywordssuggest

make sure that you are in the good directory then click on the green arrow to run the Python File.

Next, open a browser an go to the address http://127.0.0.1:5000 :

AKS-Home

Click on "Keywords Suggest" : the system is protected; Provide the defaults admin credentials : admin, adminpwd or the default guest credentials : guest, guestpwd

Next Choose an expression and a Country/Language targeted.

AKS-Search

The system will search in Google pages responding to the Keyword, save the pages, get the content and calculate a TF-IDF for each term founded in pages. Next it will provides 14 files with up to 10.000 popular or original expressions.

AKS-Results

As you can, see not all languages are filtered by Google (see here "lr" parameter to get the list : https://developers.google.com/custom-search/docs/xml_results_appendices#lrsp). However, with the country filter and the language specified in the user agent, the results are often exploitable.

Here you will see results of original 2 words expression for "SEO" in Swahili in Democratic Republic of Congo

SEO-Swahili-RDC

About

Keywords Suggestion Tool working with TF-IDF algorithm

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published