
# Vision and Cognitive Systems - Project


<a href="https://colab.research.google.com/github/GianmarcoLattaruolo/Vision_Project/blob/main/Vision_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this first cell we check if the notebook is runnig in Colab. In this case we need some additional work to set properly the environmet. We need also to mount our vision drive. In local machine instead we need to add the Geoestimation folder of our paper in the paths where python searches for libraries.

In [2]:
# with this line we can check if we are in colab or not
import sys
in_colab = 'google.colab' in sys.modules
print("are we in Colab?:",in_colab)
if in_colab:
    !pip install -q condacolab
    import condacolab
    condacolab.install()
else:
    import os
    current_wd = os.getcwd()
    if current_wd.split('\\')[-1] == 'Vision_Project':
        os.chdir(r'GeoEstimation')
    sys.path.append(current_wd + r'\GeoEstimation')

are we in Colab?: True
⏬ Downloading https://github.com/jaimergp/miniforge/releases/latest/download/Mambaforge-colab-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:17
🔁 Restarting kernel...


In [2]:
# this cell takes a lot of time on colab!
import sys
in_colab = 'google.colab' in sys.modules
if in_colab:
    import condacolab
    condacolab.check()
    from google.colab import drive
    drive.mount('/content/drive')
    import os
    os.chdir(r'/content/drive/MyDrive/GeoEstimation')
    print(os.getcwd())
    !conda env update -n base -f environment.yml
    # The following is ridiculous, I know, but it seems to work
    !pip uninstall torchtext
    !pip install torchtext==0.7

✨🍰✨ Everything looks OK!
Mounted at /content/drive
/content/drive/MyDrive/GeoEstimation
Collecting package metadata (repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - 

In theory we need to install some specific packages with certain version to account for the original environment in which the paper results were obtained:
```
  - python=3.8
  - msgpack-python=1.0.0
  - pandas=1.1.5
  - yaml=0.2.5
  - tqdm=4.50
  - cudatoolkit=10.2
  - pytorch=1.6
  - torchvision=0.7
  - pytorch-lightning=1.0.1
  - pip
  - pip:
    - s2sphere==0.2.5
```

# Reproduce paper results


To begin we try to reproduce the paper results on their test set.

In [5]:
from pathlib import Path
from math import ceil

import pandas as pd
import torch
import pytorch_lightning as pl

from classification.train_base import MultiPartitioningClassifier # class defining our model
from classification.dataset import FiveCropImageDataset # class for preparing the images before giving them to the NN

## Load the model

In [6]:
# where model's params and hyperparams are saved
checkpoint = "models/base_M/epoch=014-val_loss=18.4833.ckpt"
hparams = "models/base_M/hparams.yaml"

In [7]:
# load_from_checkpoint is a static method from pytorch lightning, inherited by MultiPartitioningClassifier
# it permits to load a model previously saved, in the form of a checkpoint file, and one with hyperparameters
# MultiPartitioningClassifier is the class defining our model
model = MultiPartitioningClassifier.load_from_checkpoint(
    checkpoint_path=checkpoint,
    hparams_file=hparams,
    map_location=None
)

Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /root/.cache/torch/hub/checkpoints/resnet50-19c8e357.pth


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=102502400.0), HTML(value='')))




In [8]:
#to allow GPU
want_gpu = True
if want_gpu and torch.cuda.is_available():
    gpu = 1
else:
    gpu = None

# the class Trainer from pythorch lightining is the one responsible for training a deep NN
# it can initialize the model, run forward and backward passes, optimize, print stats, early stop...
wanted_precision = 32 #16 for half precision (how many bits for each number)
trainer = pl.Trainer(gpus=gpu, precision=wanted_precision)

GPU available: True, used: True
INFO:lightning:GPU available: True, used: True
TPU available: False, using: 0 TPU cores
INFO:lightning:TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:lightning:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


## Load and initialize the images

In [9]:
# where images are saved
image_dir = "resources/images/im2gps"
meta_csv = "resources/images/im2gps_places365.csv"

In [10]:
#FiveCropImageDataset is the class for preparing the images before giving them to the NN
# in particular, it creates five different crops for every image
dataset = FiveCropImageDataset(meta_csv, image_dir)

Read resources/images/im2gps_places365.csv


In [11]:
batch_size = 64
dataloader = torch.utils.data.DataLoader(
                    dataset,
                    batch_size=ceil(batch_size / 5),  #you divide by 5 because for each image you generate 5 different crops
                    shuffle=False,
                    num_workers=4 #number ot threads used for parallelism (cores of CPU?)
                )

## Run the model on the test set

In [12]:
results = trainer.test(model, test_dataloaders=dataloader, verbose=False)

HBox(children=(HTML(value='Testing'), FloatProgress(value=1.0, bar_style='info', layout=Layout(flex='2'), max=…






## Look at the results

In [13]:
# formatting results into a pandas dataframe
df = pd.DataFrame(results[0]).T
#df["dataset"] = image_dir
df["partitioning"] = df.index
df["partitioning"] = df["partitioning"].apply(lambda x: x.split("/")[-1])
df.set_index(keys=["partitioning"], inplace=True) #keys=["dataset", "partitioning"] in case
print(df)

                  1         25        200       750       2500
partitioning                                                  
coarse        0.092827  0.316456  0.497890  0.670886  0.789030
middle        0.139241  0.345992  0.481013  0.683544  0.793249
fine          0.156118  0.392405  0.489451  0.658228  0.784810
hierarchy     0.147679  0.375527  0.489451  0.683544  0.789030


In [None]:
# to save the dataframe on a csv file
fout = 'test_results.csv'
df.to_csv(fout)

This two commands give the same error:
```python
OSError: /usr/local/lib/python3.8/dist-packages/torchtext/_torchtext.so: undefined symbol: _ZN3c104impl23ExcludeDispatchKeyGuardC1ENS_14DispatchKeySetE
```

In [None]:
#Inference with pre-trained model:
!python3 -m classification.inference --image_dir resources/images/im2gps/
print("\n\n\n")
#Test on Already Trained Model
!python -m classification.test

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/content/drive/MyDrive/GeoEstimation/classification/inference.py", line 8, in <module>
    from classification.train_base import MultiPartitioningClassifier
  File "/content/drive/MyDrive/GeoEstimation/classification/train_base.py", line 10, in <module>
    import pytorch_lightning as pl
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/__init__.py", line 56, in <module>
    from pytorch_lightning.core import LightningDataModule, LightningModule
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/core/__init__.py", line 14, in <module>
    from pytorch_lightning.core.datamodule import LightningDataModule
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/core/datamodule.py", line 22, in <module>
    

These are the other commands for Training from Scratch (that we won't use maybe):

```python
# download and preprocess images
wget https://github.com/TIBHannover/GeoEstimation/releases/download/v1.0/mp16_urls.csv -O resources/mp16_urls.csv
wget https://github.com/TIBHannover/GeoEstimation/releases/download/pytorch/yfcc25600_urls.csv -O resources/yfcc25600_urls.csv 
python download_images.py --output resources/images/mp16 --url_csv resources/mp16_urls.csv --shuffle
python download_images.py --output resources/images/yfcc25600 --url_csv resources/yfcc25600_urls.csv --shuffle --size_suffix ""

# assign cell(s) for each image using the original meta information
wget https://github.com/TIBHannover/GeoEstimation/releases/download/v1.0/mp16_places365.csv -O resources/mp16_places365.csv
wget https://github.com/TIBHannover/GeoEstimation/releases/download/pytorch/yfcc25600_places365.csv -O resources/yfcc25600_places365.csv
python partitioning/assign_classes.py
# remove images that were not downloaded 
python filter_by_downloaded_images.py

# train geo model from scratch
python -m classification.train_base --config config/baseM.yml
```

In [None]:
os.chdir(r'/content/drive/MyDrive/GeoEstimation/resources/images/im2gps')
print(len(os.listdir()))
os.chdir(r'/content/drive/MyDrive/GeoEstimation')
print(os.getcwd())
import torch
print(torch.cuda.is_available())

# Output would be True if Pytorch is using GPU otherwise it would be False.
print(torch.cuda.device_count())
print(torch.cuda.get_device_name(0))

237
/content/drive/MyDrive/GeoEstimation
True
1
Tesla T4


In [None]:
#@title
#libraries to import
#known
import pandas as pd
import numpy as np
import os
import re
import torchvision
import torch
import PIL
from PIL import Image
from PIL import ImageFile
import sys
import time
from math import ceil



#Unknown
from typing import Union
from io import BytesIO
import random
from argparse import Namespace, ArgumentParser
from pathlib import Path
from multiprocessing import Pool
from functools import partial
import requests
import logging
import json
import yaml
from tqdm.auto import tqdm
#from classification.train_base import MultiPartitioningClassifier
#from classification.dataset import FiveCropImageDataset

#to divide
from classification import utils_global
from classification.s2_utils import Partitioning, Hierarchy
from classification.dataset import MsgPackIterableDatasetMultiTargetWithDynLabels


The main link and paper that we need to follow is [this](https://github.com/TIBHannover/GeoEstimation) and [this](https://github.com/TIBHannover/GeoEstimation/releases/) for the pretrained models.

Davide ha trovato questo che forse è meglio [kaggle](https://www.kaggle.com/code/habedi/inspect-the-dataset/data)

[Qui](https://qualinet.github.io/databases/image/world_wide_scale_geotagged_image_dataset_for_automatic_image_annotation_and_reverse_geotagging/) ci sono dei links che potrebbero essere usati con colab col comando !wget.