# Notebook to analyze interests of Fukudon's followers. Please read the text below.

# Important Information
This notebook was ran on Google Colab Pro (<b>not</b> drive mounted). To replicate this notebook:
<ol>
<li> Import notebook to colab</li>
<li> Download resnet50_coco_best_v2.1.0.h5 from this link: https://github.com/OlafenwaMoses/ImageAI/releases/download/essentials-v5/resnet50_coco_best_v2.1.0.h5/
<li> Place model in root directory of Google Colab</li>
<li> Import "recent_five_pictures" folder into root directory of Google Colab (recent_five_pictures folder can be found in "data" folder)</li>
<li> Create "recent_five_picutres_output" folder in Google Colab </li>
<li> Change runtime to GPU</li>
<li> Install dependencies and run code (you may be asked to restart runtime after installing </li>
</ol>

# Expected Directory
In Google Colab, your directory should now consist of 3 folders and 1 file:
<ol>
<li>recent_five_pictures: folder with photos</li>
<li>recent_five_pictures_output: empty folder. Once object detection is performed, pictures with bounding boxes will be saved here.</li>
<li>sample_data: default folder from Google Colab</li>
<li>resnet50_coco_best_v2.1.0.h5: model file</li>
</ol>  

# Methodology
This notebook aims to gather demographic information surrounding Fukudon followers, namely interst. We do so by utilizing ImageAI to perform object detection on: the first five posts of EACH of Fukudon's followers.<br><br>
We specifically perform object detection on only <b>active and non-spam users</b>, defined by the following criterias:
1. Public accounts
2. Number of posts > 5
3. Number of followers > 100 
4. Number of followees > 2/3 number of followers

# Acknowledgement
While modifications were made to suit our use case, code was heavily referred from: https://towardsdatascience.com/object-detection-with-10-lines-of-code-d6cb4d86f606

## Install and import dependencies

In [2]:
!pip3 install tensorflow==2.4.0
!pip3 install opencv-python
!pip3 install keras
!pip3 install imageai --upgrade

from imageai.Detection import ObjectDetection
import os

In [7]:
execution_path = os.getcwd()
detector = ObjectDetection()
detector.setModelTypeAsRetinaNet()
detector.setModelPath("/content/resnet50_coco_best_v2.1.0.h5")
detector.loadModel()



In [10]:
object_frequency_dict = {}
for filename in os.listdir("/content/recent_five_pictures/"):
    if filename.endswith(".png"): 
         input_path = f"/content/recent_five_pictures/{filename}"
         output_path = f"/content/recent_five_pictures_output/{filename}"
         detections = detector.detectObjectsFromImage(input_image=input_path, output_image_path=output_path)
         for eachObject in detections:
            if eachObject["percentage_probability"] > 70:
                if eachObject["name"] not in object_frequency_dict:
                    object_frequency_dict[eachObject["name"]] = 1
                else:
                    object_frequency_dict[eachObject["name"]] += 1

In [18]:
object_frequency_dict

{'backpack': 3,
 'bed': 3,
 'bench': 3,
 'bird': 1,
 'boat': 2,
 'book': 1,
 'bottle': 6,
 'bowl': 98,
 'broccoli': 5,
 'cake': 22,
 'car': 50,
 'carrot': 2,
 'cat': 5,
 'cell phone': 15,
 'chair': 19,
 'clock': 6,
 'couch': 3,
 'cup': 58,
 'dining table': 25,
 'dog': 10,
 'donut': 4,
 'fire hydrant': 1,
 'fork': 3,
 'frisbee': 3,
 'handbag': 4,
 'hot dog': 3,
 'keyboard': 1,
 'knife': 5,
 'laptop': 1,
 'motorcycle': 3,
 'mouse': 1,
 'orange': 6,
 'person': 745,
 'pizza': 4,
 'potted plant': 14,
 'sandwich': 12,
 'snowboard': 1,
 'spoon': 5,
 'teddy bear': 1,
 'tie': 2,
 'toilet': 1,
 'traffic light': 1,
 'truck': 1,
 'tv': 2,
 'umbrella': 2,
 'vase': 3,
 'wine glass': 9}

# Wordcloud

In [None]:
# TODO

# Barchart (same as Wordcloud but just serves as another visualization)

In [None]:
# TODO

# Clustering and Visualization

Step 1) Encode each key in object_frequency_dict with BERT or GloVe <br>
Step 2) Perform KMeans Clustering <br>
Step 3) Perform t-SNE and visualization <br>

In [None]:
# TODO