# [TLDR] given several `home planets`, we simultaneously look outwards from all home worlds to find the `nearest` new `neighbouring planets` to our civilization

- `home planets`
  - they denote products that match our search tokens
  - as they match on search tokens I consider them pretty good starting locations to look for new products from
  - these new products don't need to have token matches, just to be "similar"
- `neighbouring planets`
  - they denote products that didn't satisfy our original search token request
  - being "close" in numerical space is prob not a bad place to look for additional products
- `nearest`
  - this nearness in numerical sense, i.e. making use of numerical attributes in our dataset
  - numerical attributes deemed to be useful (in some sense) are cleaned up and normalized on which we can apply a numeric metric for nearness
    - nan's are imputed (to avoid throwing away products, imputed with population mean)
    - log-transformations are done to make the data more normalized
    - means are stripped and variances normalized to put all attributes into a similar scale (as the numeric metric is sensitive to scale)

# IMPORTS

In [None]:
%run ipynb_setup.ipynb

In [None]:
%run class_Dataset.ipynb

In [None]:
from typing import List
from sklearn.neighbors import NearestNeighbors

# CLASS DEF

In [None]:
class NeighbourSearch():

    def __init__(
        self,
        dataset   : Dataset,
        ) -> None :

        self.dataset   = dataset
        self.model     = NearestNeighbors().fit(self.dataset.df_num)
    
    # given list of locs, return a larger list of size `n` x len(locs) of locs that are closest to original locs
    def get_n_nearest_from_locs(
        self,
        n_nearest : int,
        locs      : List[int],
        ):
        # get nearest_n neighbors from each element of locs 
        distances,indices=self.model.kneighbors(
            self.dataset.df_num.loc[locs],
            n_neighbors = n_nearest,
        )
        
        # merge results into single vector
        nearest_results = pd.Series(np.ravel(distances),index=np.ravel(indices)).sort_values(ascending=True) # ordered from nearest to furthest, want globally nearest to origin family
        
        # return
        return nearest_results

In [None]:
'''
neighbour_search=NeighbourSearch(dataset=Dataset())
neighbour_search.get_n_nearest_from_locs(n=5,locs=[0,1,2])
neighbour_search.get_n_nearest_from_locs(n=5,locs=[0,1,2,3,4,5,6])
'''
None