Bid_Extraction_from_WebPage

Extraction of bid numbers from web page

The model consists of a different layers , first an embedding layer for word embeddings. Then a dropout Layer to overcome overfitting of dataset. A Concolutional 1D layer to extract features followed by a max pooling layer. An LSTM layer for building up long and short term dependencies. Finally a dense layer with softmax activation to predict the probability of a bid number.

There is a pickle file (tokenizer.pickle)consisting of tokens to be loaded for the models to make predictions.

There is a weights file(checkpoints) to be loaded on the model for it to give predictions as output.

The BeautifulSoup module is used to open the link provided in the link variable and parsed using the lxml parser. The desired tags to be checked are mentioned in the desired_elements.

The tags extracted from the web page is then formatted as in to remove extra spaces , tab space and newline characters. The text collected from these tags are then added to a dictionary and finally converted into dataframe. Then the texts in the dataframe is tokenized and finally fed into the model for prediction which outputs the probabitlity of each line being a bid number.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
checkpoints		checkpoints
requirements.txt		requirements.txt
test_predictions.py		test_predictions.py
tokenizer.pickle		tokenizer.pickle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

checkpoints

checkpoints

requirements.txt

requirements.txt

test_predictions.py

test_predictions.py

tokenizer.pickle

tokenizer.pickle

Repository files navigation

Bid_Extraction_from_WebPage

About

Releases

Packages

Languages

SutapaSerenya/Bid_Extraction_from_WebPage

Folders and files

Latest commit

History

Repository files navigation

Bid_Extraction_from_WebPage

About

Resources

Stars

Watchers

Forks

Languages