# Determining Relevant Job Title Pictures

The objective of this short project is to automatically find relevant pictures for job titles, to show them in our pages that explain what those job titles do.

We are looking for pictures that represent the activities typically associated with that job title, showing people or tools used on the job.

## Dataset

The dataset provided is a .csv file with a set of pictures queried from Google Images, with the following available metadata and pre-extracted features:

* Job Title: the job we are trying to get pictures for.

* Pic Title: the picture title according to Google Images.

* Num Resumes: the number of resumes we have available for the job title. This indicates how popular /common the job title is.

* Google Position: the position of the image in the search results.

* Strict Face Count: face count according to a high-precision face recognition algorithm from OpenCV. When it detects a face, it is a human face almost 100% of the time, but it doesn't detect all human faces.

* Relaxed Face Count: face count according to a high-recall face recognition algorithm from OpenCV. It detects human faces almost 100% of the time, but it also confuses some other shapes and objects with human faces.

* KB size: size of the image in kilobytes.

* Height: pixel height of the image.

* Width: pixel width of the image.

* Resolution: total pixels in the image.

* Text regions: number of text regions identified in the image, also using OpenCV.

* Picture URL: the URL of the image.

* Manual Label: a label indicating if the picture is good (relevant) or bad (irrelevant).

## Objective

The objective of the project is to create an algorithm that will produce a label ("good" or "bad") for every image provided in the dataset. You can use any approach you want for this. Around 130 labaled images are provided for you to use as reference.


## Evaluation

The results of the project will be evaluated on the following aspects:

* Formatting of the results: the results of your work should be provided in the same format as the test: an .ipynb notebook with the code, and a .csv file with the data. The only difference between the input and output .csv files should be one, and only one, additional column at the end of the sheet (column N) containing the labels calculated by your solution. This is so we can automatically measure the accuracy of the results, and to keep the format consistent across candidates. Failure to provide the results in the specified format will most likely result in a dismissed application.

* Objective quality of the results: we have a set of labels that your results will be compared against. The higher the accuracy, the better.

* Logic applied and simplicity of the solution. Everything else being equal, a well-reasoned and simple solution will be considered better than a seemingly random and complicated solution.

## Tips

* Look at a few labeled pictures on your browser to understand what makes a good picture and what makes a bad picture.

* When you produce a set of labels, look at a few pictures outside of the pre-labeled set to see if you generally agree with your algorithm's labels.

* Do not re-download and re-process the pictures. Image processing can be time-consuming and the most important related features have already been provided.