Web scraping tool for data mining analysis. Contains a python script to perform the web scraping and the necessary instructions for GigaBlast Search Engine setup.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Web Scraping Tool For Data Mining

What does it do?

The application search the Internet utilizing the Gigablast search engine with terms that the user types to a csv file located at the root of the application. After the search is performed, the application creates txt files that can be imported to any platform for data mining analysis.

What do you need to run the app?

In order to run the app you need to create a Gigablast account. You can do so at Gigablast. It is a $5.00 fee. Gigablast charges $0.99 per 1000 queries. After you create the account, you need to set your userid and your code in the my_params variable inside text_mining.py.

UserId and Gigablast code

You will also need Python3 installed in your computer. You can visit the Python website for instructions on how to install Python.

Instructions on how to setup the running environment.

Once you have Python3 installed on your computer, and Python3 is in your global environment; instructions on how to make Python3 global in your environment here, you can clone the repo to your computer.

$ git clone https://github.com/diazgilberto/data_mining_project.git

After cloning the repo, open your terminal, cd to data_mining_project, and open the query.csv. Modify the file with the terms you want to search, save and close the file. On your terminal run the following command... python3 text_mining.py or python text_mining.py

it depends on how you setup python3 in your global environment. Shortly after you start the app, you will see status messages in the terminal. You will also notice that new .txt files are getting created in the root of data_mining_project. After the process is finished, your files a ready to performed data mining analysis.