Building Family tree using Electoral Data (India)

A very brief overview but hopefully I'll update this soon.

While working on this project (a hiring challenge, cut to the chase was not hired), I was able to scrape 25k+ data from the various sources and successfully built family trees. In my initial testing my approach accurately linked 3+ generations which will eventually produce greater depths given more time. It was fun.

Overview

Tech/Tools At work

Computer Visio (OCR, Canny Edges)
NLP (Language Transalation (hindi->english, tamli->english))
Web Scraping
Graphical Analysis
Family Relationship Resolver
Full Text Search with Text Similarity Score
API's
Patience: had to build this in 2 days and 2 days for bugs :))
Automation
CRON jobs

Setup Tesseract (OCR)

https://tesseract-ocr.github.io/tessdoc/Installation.html

sudo apt install tesseract-ocr
sudo apt install libtesseract-dev

search language packs:  apt-cache search tesseract-ocr
lang-packs: https://github.com/tesseract-ocr/tessdata/tree/main/script
lang-location: /usr/share/tesseract-ocr/4.00/tessdata

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
doc		doc
.gitignore		.gitignore
README.md		README.md
api.py		api.py
captcha_solver.py		captcha_solver.py
cron_task.py		cron_task.py
db.py		db.py
electoral_pdf.py		electoral_pdf.py
family_relationships.py		family_relationships.py
graph_linking.py		graph_linking.py
image_processing.py		image_processing.py
main.py		main.py
pdf.py		pdf.py
requirements.txt		requirements.txt
runner.py		runner.py
scraper_handler.py		scraper_handler.py
text_processing.py		text_processing.py
voter_info.py		voter_info.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building Family tree using Electoral Data (India)

A very brief overview but hopefully I'll update this soon.

Overview

Tech/Tools At work

Setup Tesseract (OCR)

About

Releases

Packages

Languages

divinenaman/family-tree-builder

Folders and files

Latest commit

History

Repository files navigation

Building Family tree using Electoral Data (India)

A very brief overview but hopefully I'll update this soon.

Overview

Tech/Tools At work

Setup Tesseract (OCR)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages