Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.idea/
46 changes: 36 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,47 @@
# Build Experience Profile from Resumes

In this project, we use text extraction and retireval for the following functions:

1. Get useful information from a candidate’s (prospective employee) resume
2. Build an experience profile connection candidate’s experience in various tools and technologies, for each of the resumes
3. Rank the available set of resumes based on the skill set, using the keywords provided in the query
1. **What are the names and NetIDs of all your team members? Who is the captain? The captain will have more administrative duties than team members.**

The current keyword based search used by many online websites might not be entirely accurate, as the coorelation between the skills and the experience is often missing.
* alokk3@illinois.edu
* dkrovi2@illinois.edu
* jsaxena3@illinois.edu
* rathi9@illinois.edu

For example, for a skill set of ‘Spark’, instead of just searching for the keyword ‘Spark’ in the resume, we want to know (for scoring purpose)
- if the employee worked in Spark for X number of years,
- did he have experience on Spark, in multiple organizations.
2. **What is your free topic? Please give a detailed description. What is the task? Why is it important or interesting? What is your planned approach? What tools, systems or datasets are involved? What is the expected outcome? How are you going to evaluate your work?**

We then create a score for each profile/resume based on the skill set mentioned in the query and rank them in order of score (highest to lowest).
In this project, we use text extraction and retrieval for the following functions:

We use the standard text retireval tools and programming APIs (MeTA, python, numpy etc) with a customized algorithm to score each resume.
* Parse resumes in doc and pdf format
* Parse job descriptions in doc and pdf format
* Build an analysis engine to extract experience details of a candidate on various tools and technologies
* Rank the available set of resumes based on the skill set specified in the job description

The current keyword based search used by many online websites might not be entirely accurate, as the correlation between the skills and the experience is often missing.

For example, for a skill set of ‘Spark’, instead of just searching for the keyword ‘Spark’ in the resume, we want to know (for scoring purpose)
- if the employee worked in Spark for X number of years,
- did he have experience on Spark, in multiple organizations.

We then create a score for each profile/resume based on the skill set mentioned in the query and rank them in order of score (highest to lowest).

3. **Which programming language do you plan to use?**

We will use the standard text retrieval tools and programming APIs (MeTA, python, numpy etc) with a customized algorithm to score each resume.

4. **Please justify that the workload of your topic is at least 20 \* N hours, N being the total number of students in your team. You may list the main tasks to be completed, and the estimated time cost for each task.**

The following are the steps and key milestones for this project:

| Task | Time needed | Completion date |
|:--------------------------------------------------------------|-------------:|----------------:|
| Parsing engine to parse resumes and job descriptions | 20 hours | Nov 15 |
| Progress report | 2 hours | Nov 15 |
| Analysis engine to analyze resumes | 30 hours | Nov 22 |
| Scoring engine to match resumes to provided job description | 30 hours | Nov 29 |
| Basic UI to search for resumes matching a job description | 24 hours | Dec 5 |
| Software documentation | 8 hours | Dec 9 |
| **Total** |**114 hours** | |

# Contributors

Expand Down