Avantari Technologies - Machine Learning Task Solution

Creating an AutoEncoder model for finding similar images and partitioning the dataset into K groups.

For detailed explanation please refer README.pdf.

Sample outputs

The outputs from two different executions of the final solution notebook (final_solution.ipynb) is saved in HTML format for quick viewing.

Please see the following files present inside "Sample outputs" directory:

final_solution_1_default.html
final_solution_2_cached.html

This can be used to quickly see how final solution notebook execution looks like. Use this in case final_solution.ipynb fails to execute on your system for some reason.

Requirements to execute final_solution.ipynb on your own

Python 3.7.x or 3.8.x
Jupyter Notebook pip install jupyter
Tensorflow 2.3.0 pip install tensorflow-gpu==2.3.0
Matplotlib pip install matplotlib
Pillow pip install pillow

Steps to execute

Clone this GitHub repo: $git clone https://github.com/Shashank9830/avantari_tech
Change directory to avantari_tech: $cd avantari_tech
Load the solution notebook using $jupyter notebook final_solution.ipynb
Edit the variables in 3rd code cell before execution.
Set the appropriate value for input_file, mode and sim_count (details are given in comments).
Run all cells.
For multiple executions, run all code cells below the 3rd code cell (including it) after making required changes to the 3rd code cell each time.

Understanding final cell output

After execution of all cells, the final cell output should be like this:

Image in the first row in the input image.
Images in the subsequent rows are similar images ranked in order of decreasing similarity.
Similarity decreases from left-to-right and then top-to-bottom.

Solution details

Refer README.pdf for detailed explanation of the approach used to find N similar images and to partition the dataset into K-groups.

File information.

	Filename	Type	Information
1	dataset	Directory	Original dataset.
2	resize_dataset.py	Python	Resizes the dataset images to 256x256.
3	resized_256	Directory	Resized dataset.
4	create_autoencoder.py	Python	Creates an autoencoder model.
5	autoencoder.h5	H5	AutoEncoder model saved in H5 format.
6	trainer_notebook.ipynb	Jupyter	Model training code.
7	trained_autoencoder.h5	H5	Trained autoencoder saved in H5 format.
8	trained_encoder.h5	H5	Encoder part of the trained autoencoder.
9	get_encodings.ipynb	Jupyter	Code to get the encodings of all the images.
10	encodings.npy	NumPy	Encodings of all 4738 images.
11	get_similarity.ipynb	Jupyter	Code to find similarity of all the images with each other.
12	cosine_similarity_matrix.npy	NumPy	Cosine similarity matrix generated in the previous step.
13	sim_mat_sorted.json	JSON	Images sorted in decreasing order of similarity to each other.
14	final_solution.ipynb	Jupyter	Main user notebook. Run this for final output.
15	Sample outputs	Directory	Some pre-executed notebook outputs in HTML format. One example of both cached and default mode.
16	k_grouping.py	Python	Code to implement Elbow and K-medoids algorithm.
17	k_groups.json	JSON	JSON file containing list of medoids and clusters
18	partition_dataset.py	Python	Code to partition the dataset as mentioned in the above JSON file
19	K Groups	Directory	Folder containing K-Groups
20	.ipynb_checkpoints	---	---

Authors

Shashank Singh - Complete work - shashank9830

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Avantari Technologies - Machine Learning Task Solution

Sample outputs

Requirements to execute final_solution.ipynb on your own

Steps to execute

Understanding final cell output

Solution details

File information.

Authors

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.ipynb_checkpoints		.ipynb_checkpoints
K Groups		K Groups
Sample outputs		Sample outputs
dataset		dataset
resized_256		resized_256
.gitattributes		.gitattributes
README.md		README.md
README.pdf		README.pdf
autoencoder.h5		autoencoder.h5
cosine_similarity_matrix.npy		cosine_similarity_matrix.npy
create_autoencoder.py		create_autoencoder.py
encodings.npy		encodings.npy
final_solution.ipynb		final_solution.ipynb
get_encodings.ipynb		get_encodings.ipynb
get_similarity.ipynb		get_similarity.ipynb
k_grouping.py		k_grouping.py
k_groups.json		k_groups.json
partition_dataset.py		partition_dataset.py
resize_dataset.py		resize_dataset.py
sim_mat_sorted.json		sim_mat_sorted.json
trained_autoencoder.h5		trained_autoencoder.h5
trained_encoder.h5		trained_encoder.h5
trainer_notebook.ipynb		trainer_notebook.ipynb

Shashank9830/avantari_tech

Folders and files

Latest commit

History

Repository files navigation

Avantari Technologies - Machine Learning Task Solution

Sample outputs

Requirements to execute final_solution.ipynb on your own

Steps to execute

Understanding final cell output

Solution details

File information.

Authors

About

Topics

Resources

Stars

Watchers

Forks

Languages