# BIOINF529 Homework #4 - Winter 2020
This homework is worth **10% of your final grade**.

The exam is due before the next course module begins as enforced by Canvas.

## Coding by Contract
We (the Instructors) promise a fair, impartial, and objective means of grading such that you (the Students) follow the tenets of Coding by Contract:
1. You must not modify/delete any of the existing code in this document (besides the `pass` statements)
* Your functions must use the function signatures as written
* Your functions must return/print the expected results (as written)

If these are followed correctly, your submission should be compatible with the automated testing suite. Therefore, the more tests your code passes, the less scrutiny your code will be under by our review. We do not care *how* you get there, just that you get there *correctly*.

## Submission
Please rename this notebook to **homework4_uniqname.ipynb** for submission. 

For example:
> `homework4_apboyle.ipynb`

We will *only* grade the most recent submission of your exam.

## Late Policy
Each submission will receive a **25%** penalty per day (up to three days) that the assignment is late.

After that, the student will receive a **0** for the homework.

## Academic Honor Code
You may consult with others. However, all answers must be your own and code comparison software will be used to enforce this rule. You are allowed to ask questions at office hours but the answers given will be high-level/conceptual in nature.


### 1. Needleman-Wunsch

Implement the Needleman Wunsch algorithm using BLOSUM matrix for matches/mismatches to generate global alignments of protein sequences.

```
needleman_wunsch(seq1, seq2)
    Initialize [len(seq1)+1] x [len(seq2)+1] numpy array as scoring matrix with 
        first column and row starting at 0 and decrementing by gap penalty
    
    Fill scoring matrix (score_matrix) and Traceback matrix, record the position with max score (max_pos):
    for i in each row number:
        for j in each column number:
            S[i][j] = max( S[i-1][j-1] + compute_diag_score, S[i-1][j] + gap_score, H[i][j-1] + gap_score)
            T[i][j] = direction of max( S[i-1][j-1] + compute_diag_score, S[i-1][j] + gap_score, S[i][j-1] + gap_score)
    
    Traceback. Find the optimal path through scoring matrix starting at bottom right position
    
```

As a reminder from the slides, the scoring for Needleman-Wunsch only uses the scores from the positions above, left, and above-left of the current position in the matrix as bellow:

<center><img src="figures/Needleman-Wunsch_scoring.png"></center>

For traceback, you will need to keep track of the direction of the arrows in a matrix and then begin traceback from the maximum value.

We have provided a function to return scores from the BLOSUM62 matrix for amino acid substitutions. Use this for match and mismatch scores:
```
blosum_score('K', 'M') # Returns score for a Lysine to Methionine change (-1)
```

You are expected to implement the following functions: `needleman_wunsch`, `cal_score`, `compute_diag_score`, and `traceback`. Implement as defined in the code below and according to the concept of 'coding by contract' as discussed in class. If your functions do not take as input or provide as output the variables that we define, you will not receive credit.

In [205]:
import numpy as np
from homework4 import blosum_score

def needleman_wunsch(seq1, seq2, gap=-4):
    '''Needleman Wunsch algorithm for global alignment
    
    Args:
        seq1 (str): input seq 1
        seq2 (str): input seq 2
        gap (int): default = -4
    
    Returns:
        aligned_seq1 (str)
        aligned_seq2 (str)
        score_matrix (numpy array): scoring matrix
    '''
    
    #The following code was taken from Alan's Class 17 Solutions for Smith_Waterman and changed to fulfill the Needleman_Wunsch task. 
    #Here we are doing global alignment, NOT local alignment. 
   
    #initialize the row and col 
    num_row = len(seq1) + 1 #for first row 
    num_col = len(seq2) + 1 #for first column
   
    score_matrix = np.zeros(shape = (num_row, num_col), dtype = int)
    traceback_matrix = np.zeros(shape = (num_row, num_col), dtype = int) 
    
    #this is for the decrementing in the first row and column. For each value in the range, I want to multiply each by the same gap value. 
    #and then update the score_matrix with these values. 
    for index1 in range(0, num_row):
        score_matrix[index1][0] = index1 * gap #we're running through each row but not each column (which is why col index is 0)
    for index2 in range(0, num_col):
        score_matrix[0][index2] = index2 * gap #we're running through each column but not each row (row col index is 0) 

    # Fill scoring matrix (score_matrix) and Traceback matrix, record the position with max score (max_pos):
    # for i in each row number:
    # for j in each column number:
    #don't want to include the first position since that's where you'll be starting. 
    for index1 in range(1, num_row):
        for index2 in range(1,num_col):
            # S[i][j] = max( S[i-1][j-1] + compute_diag_score, S[i-1][j] + gap_score, H[i][j-1] + gap_score)
            # T[i][j] = direction of max( S[i-1][j-1] + compute_diag_score, S[i-1][j] + gap_score, S[i][j-1] + gap_score)
            score_matrix[index1][index2], traceback_matrix[index1][index2] = cal_score(score_matrix, seq1, seq2, index1, index2, gap)
    
   #Traceback Find the optimal path through scoring matrix starting at bottom right position
    aligned_seq1, aligned_seq2 = traceback(seq1, seq2, traceback_matrix)
    
    return aligned_seq1, aligned_seq2, score_matrix 
   

In [206]:
def cal_score(matrix, seq1, seq2, i, j, gap):
    '''Calculate score for position (i,j) in scoring matrix, also record move to trace back
    
    Args:
        matrix (numpy array): scoring matrix
        seq1 (str): input seq 1
        seq2 (str): input seq 2
        i (int): current row number
        j (int): current column number
        gap (int): gap penalty
        
    Returns:
        score in position (i,j)    
        move to trace back: 0-DIAG, 1-UP, 2-LEFT
        
    Pseudocode:
        Calculate scores based on upper-left, up, and left neighbors:
            diag_score = upper-left + (match or mismatch)
            up_score = up + gap
            left_score = left + gap
        score = max(diag_score, up_score, left_score)
        traceback = maximum direction or end
        
    '''
    #The following code was taken from Alan's Class 17 Solutions for Smith_Waterman and changed to fulfill the Needleman_Wunsch task. 
    
    #first initialize my two values that need to be returned. 
    score = 0 
    move = 0 
    
    #we already have 
    #calculating scores based on upper-left, up, and left neighbors. 
    #need to use BLOSUM62 matrix for amino acid substitutions and provided function to calculate score. 
    diag_score = matrix[i-1][j-1] + compute_diag_score(seq1[i-1], seq2[j-1]) 
    up_score = matrix[i-1][j] + gap 
    left_score = matrix[i][j-1] + gap 
    #final score is going to be the max of the moves or 0. 
    #move to trace back: 0-DIAG, 1-UP, 2-LEFT
    score = max(diag_score, up_score, left_score) 
    
    #need to be in the exact same order as the score above. 
    #move to trace back: 0-DIAG, 1-UP, 2-LEFT, need to put it into a list. 
    move = np.argmax([diag_score, up_score, left_score]) 
    
    return score, move 


In [207]:
def compute_diag_score(char1, char2):
    '''
        Calculate score for a diagonal shift for the matrix
        using provided blosum_score() function
        
        Args:
            char1 (str): base in alignment 1
            char2 (str): base in alignment 2
    
        Returns:
            diag_score (int): diag score
    '''
    #need to use the provided blosum_score() function. 
    #our arguments are two strings, so this should work. 
    
    #calling the function from the homework4 script. 
    diag_score = blosum_score(char1, char2) #this will return an int which is what we want. 
    
    return diag_score 


In [208]:
def traceback(seq1, seq2, traceback_matrix):
    '''Find the opmital path through scoring matrix 
        
        diagonal: match/mismatch
        up: gap in seq1
        left: gap in seq2
        gap symbol is '-'
        
    Args:
        seq1 (str) : First sequence being aligned
        seq2 (str) : Second sequence being aligned
        traceback_matrix (numpy array): traceback matrix
        
    Returns:
        aligned_seq1 (str): e.g. GTTGAC
        aligned_seq2 (str): e.g. GTT-AC
        
    Pseudocode:
        current_row and current_col = bottom right of matrix
        while not in top left of matrix:
            current_move = traceback_matrix[current_row][current_col]
            if current_move == DIAG:
                ...
            elif current_move == UP:
                ...
            elif current_move == LEFT:
                ...
            
    '''
    #The following code was taken from Alan's Class 17 Solutions for Smith_Waterman and changed to fulfill the Needleman_Wunsch task.  
    
    #initalize what needs to be returned, making the lists but will convert and join to strings at end. 
    aligned_seq1 = []
    aligned_seq2 = []
    gap = "-"  #needs to be this symbol 
    
    #current_row and current_col = bottom right of matrix 
    tup_matrix = traceback_matrix.shape
    list_tup = list(tup_matrix) 
    #this will index to the bottom right corner of the matrix. 
    current_row = list_tup[0] - 1  
    current_col = list_tup[1] - 1   
    current_move = None 

    #need to set the DIAG, UP, and LEFT equal to 0,1,2 respectively 
    DIAG, UP, LEFT = range(3) 
    
    #while not in top left corner of matrix. 
    while current_row != 0 and current_col != 0:
    # while current_move != traceback_matrix[0][0]: #need to index the top left corner  so need to match to actual position (0,0) matrix indics 
        current_move = traceback_matrix[current_row][current_col]  #we are starting at bottom right corner of matrix 
        if current_move == DIAG:
            aligned_seq1.append(seq1[current_row-1]) 
            aligned_seq2.append(seq2[current_col-1])
            
            #update the current_row and col values 
            current_row = current_row - 1
            current_col = current_col - 1
        
        elif current_move == UP:
            aligned_seq1.append(seq1[current_row-1]) #moving up one row 
            aligned_seq2.append(gap)  #gap in the column
            
            #update the current_row value 
            current_row = current_row - 1
            
        elif current_move == LEFT:
            aligned_seq1.append(gap) #gap in the row 
            aligned_seq2.append(seq2[current_col-1]) #moving left one column
            
            #update the current_col value 
            current_col = current_col - 1
            
    #now need to reverse the order of the sequences since we filled them backwards 
    aligned_seq1 = aligned_seq1[::-1]
    aligned_seq2 = aligned_seq2[::-1]
    
    #now convert the lists to strings. 
    aligned_seq1 = "".join(aligned_seq1) 
    aligned_seq2 = "".join(aligned_seq2)
    
    return aligned_seq1, aligned_seq2


In [209]:
# Example for testing
seq1 = 'MEKIGGTEKQDIPKYSLHFFSQILEIAPAAKGLFSFLRDSDEVPHNNPEEGKVVVADTTLQYLGSIHLKSGVIDPHFEALLRTLKEGGEKYNEEVEGAWSQAYDHLALAIES'
seq2 = 'MEKIGGEEAERKAVQATWARLYANCEDVGVAILVRFFVNFPSAKQYFSQFKHMEEMEHDPEKVSSVLSLVGKAHALKHKVEYFKLSGVILEVIAEEFANDFPPYSHVTAAYKEVQVPNATTPPATLPSSGP'

aligned_seq1, aligned_seq2, score_matrix = needleman_wunsch(seq1,seq2)

print (aligned_seq1)
print (aligned_seq2)
print (score_matrix)

MEKIGG--TE-K--QDI-PK-Y-SLHFFS-QIL-E--I-APAAKGLFS-FLRDSDEVPHNNPEEGKVVVADTTLQYLGSIH-LKSGVIDPHFE-A--LLRTL-KE-GGE--KYNEEVEGAWSQA-YDHLAL--A-I-ES--
MEKIGGEEAERKAVQATWARLYANCEDVGVAILVRFFVNFPSAKQYFSQF-KHMEEMEH-DP-E-K--VS-SVLSLVGKAHALKHKV-E-YFKLSGVILEVIAEEFANDFPPYS-HVTAAYKEVQVPNATTPPATLPSSGP
[[   0   -4   -8 ... -516 -520 -524]
 [  -4    5    1 ... -507 -511 -515]
 [  -8    1   10 ... -498 -502 -506]
 ...
 [-440 -431 -422 ...   47   43   39]
 [-444 -435 -426 ...   49   45   42]
 [-448 -439 -430 ...   56   52   48]]


## My Work for Needleman_Wunsch Below:

In [13]:
DIAG, UP, LEFT = range(3)

In [14]:
print(DIAG)

0


In [15]:
print(UP)

1


In [16]:
print(LEFT)

2


In [82]:
test = ["A", "M", "E", "L", "I", "A"]
test
len(test)

6

In [34]:
test = "".join(test)
test

'AMELIA'

In [55]:
test_matrix = np.empty((2,2), dtype = float)
test_matrix 

array([[0., 0.],
       [0., 0.]])

In [56]:
#goal: I want the current_col and current_row to equal bottom right of matrix 

In [49]:
tuple_test = test_matrix.shape
tuple_test

(5, 5)

In [52]:
test_matrix[0][0]

0.0

In [78]:
test2 = np.random.randint(1,100,(3,3))
test2

array([[16, 31, 93],
       [89, 12, 72],
       [87, 52, 25]])

In [79]:
value_col = 2
value_row = 2
final = test2[value_row][value_col]
final

25

In [81]:
tup_test2 = test2.shape
list_test2 = list(tup_test2)
print(list_test2)
row_value = list_test2[0] - 1
col_value = list_test2[1] - 1

final = test2[row_value][col_value]
final

[3, 3]


25

In [113]:
final2 = test2[0][0]
final2

13

### 2. DBscan Clustering


Implement DBSCAN clustering to partion a set of 'points' into distinct clusters.

```
DBSCAN(points, epsilon, MinPts):
    Select a point
    If there are at least MinPts within epsilon of point, mark as core point and include core point and neighbors as cluster
    Repeat the previous step for each neighbor until no new points are added
    Select an unvisited point and repeat
```

<center><img src="figures/dbscan.gif"></center>

For this question, we have provided a dataclass Point that contains the information for each point in the 2D space where we will be clustering. This class is defined below and includes a description of how to use the class. A demo of the clustering is shown above where we have plotted the steps of DBSCAN for the demo data provided.

For this question you will implement all of DBSCAN in the `DBSCAN` function defined below. This function includes sub functions that you will also implement for `find_neighbors` and `euclidean_distance` that are all also defined below. These sub-functions should help you break apart the parts of the algorithm described above.

HINT: An efficient way to implement this is very similar to our breadth first search implementation

In [210]:
import math
from dataclasses import dataclass

@dataclass
class Point:
    ''' A dataclass provided to store points and their cluster
    Created as:
        new_point = Point(x_position=x, y_position=y, cluster=cluster)
    Values can be addressed as:
        new_point.x_position = 0
        print(new_point.y_position)
    '''
    x_position: float
    y_position: float
    cluster: int = None
    
def DBSCAN(points, epsilon, MinPts):
    ''' Algorithm for DBSCAN clustering of point objects
    
    Args:
        points (list of Point) : points in our 2D space to be clustered
        epsilon (float) : distance to allow assignment to current cluster 
        MinPts (int) : minimum number of points to assign a core_point in DBSCAN
        
    Returns:
        points (list of Point) : points in our 2D space with Point.cluster assigned to appropriate cluster 
        and non-assigned points in cluster 0 or cluster None
    
    '''
    
    def find_neighbors(point, points, epsilon):
        ''' Finds all neighboring points within euclidean distance epsilon of point
        
        Args:
            point (Point): point for which we are estimating distance  starting point 
            points (list of Point): list of points for clustering  every other point 
            epsilon (float) : distance to allow assignment to current cluster   epsilon is a variable (distance) 
            
        Returns:
            neighbors (list of Point): list of all points within epsilon of point
        '''
        
        #initialize neighbors list first 
        neighbors = []
        #get my points here. We already have 
        for index in range(len(points)): #need to go through points list for each point. 
            distance_temp = euclidean_distance(point, points[index])
            if distance_temp <= epsilon:  #epsilon acts as the threshold or radius.  
                neighbors.append(points[index]) 
            
        #need to take your point and then for loop through the points list and compare that point with your point 
        #euclidean_distance 
        #only finding the neighbors for that specific point. 
        #<= epsilon 
        return neighbors  
    
    def euclidean_distance(point1, point2):
        ''' Calculates Euclidean distance between two Points
        
        Args:
            point1 (Point): first point for comparison
            point2 (Point): second point for comparison
            
        Returns:
            distance (float): Euclidean distance between points
        '''
        #need to use the class
        x1 = point1.x_position 
        x2 = point2.x_position 
        y1 = point1.y_position 
        y2 = point2.y_position 
        
        #math.sqrt will return a float (I googled this). 
        distance = math.sqrt((x1-x2)**2 + (y1-y2)**2)
        
        return distance   
    
    
    #beginning of DBSCAN "mother" algorithm
    #per Brad, do not have to randomly select a point, I can just iterate through each point in the points list. 
    
    #Below is taken from my homework_4_template_doc, so please refer to the doc for the full explanation 
    # as to why this code below does not fully work for DBSCAN. 
    
    #I did end up getting four clusters, not including the outliers which are designated with 0’s. 
    # The problem was the clusters were labeled as “2”, “6”, “11”, and “12” instead of “1”, “2”, “3”, “4”. 
    # It must be due to the neighbors not being iterated properly with BFS, since they are checked properly 
    # for core, border, or outlier point to my knowledge. I did try to implement the BFS again with the neighbor 
    # list now, and I get a worse result, with 5+ clusters and outliers (which were still 0), clusters being labeled wrong. 
    # oping to discuss this with instructors or Brad after homework is handed in to see exactly what I did wrong. 
    
    #NOTE:
    #what I tried is commented out since it'll give more than 4 clusters. But I wanted to show you that I did 
    #try it, since it conceptually make sense to me, but it was giving me huge cluster values. 
    
        
    core = 1  
    not_core = 0
    N = []
    
    for pt in points:
        if pt.cluster == None:
            N.append(pt)
            while N: #breadth search 
                node = N.pop(0)  #need to pop out the first in index. 
                # print(node)
                if node.cluster == None: #which they are all starting as none, so all of them. 
                    neighs = find_neighbors(node, points, epsilon)  #this will output the neighs list. 
                    #need to meet our criteria to be a core point 
                    if len(neighs) >= MinPts:
                        node.cluster = core #then it is a core point. 
                        while neighs:
                            ngh = neighs.pop(0)
                        # for neigh in neighs: #take that core point's neighbors and see if they are core points as well. 
                            neighs2 = find_neighbors(ngh, points, epsilon)
                            if len(neighs2) >= MinPts:
                                N.append(ngh)
                                # neighs2.append(ngh) | gives me 5+ clusters SHOULD BE THIS I THINK
                                #one point frm master list, and then testing a subset, now work with subset, 
                                node.cluster = core
                                # ngh.cluster = core | this leads to 5+ clusters | SHOULD BE THIS I THINK SINCE WE'RE ONLY LOOKING AT NEIGHBORS
                            else:
                                node.cluster = core #these points even though are not core points, they are still in this cluster. 
                                # ngh.cluster = core | this leads to 5+ clusters | SHOULD BE THIS I THINK SINCE WE'RE LOOKING AT ONLY NEIGHBORS 
                    else: #if not a core point. 
                        # pass 
                        node.cluster = not_core  #not a core point and not in the cluster/could be outlier. 
                else: #if it's not a None point at all, just pass. 
                    pass
            core = core + 1
        else:
            pass 
        
    return points


In [211]:
# This is demo code to initialize a set of point objects similar to that in the slides from class
demo_points_x = [3,4,4,5,5,6,6,7,7,5,6,7,8,8,8,9,11,12,13,13,17,17,18,18,18,19,21]
demo_points_y = [7,5,3,6,7,5,7,6,5,14,15,13,16,13,12,14,13,15,15,13,7,9,5,7,9,6,8]
list_of_points = []
for x,y, in zip(demo_points_x, demo_points_y):
    list_of_points.append(Point(x_position=x, y_position=y))

In [212]:
points = DBSCAN(list_of_points, epsilon=2, MinPts=3)
print(points)

[Point(x_position=3, y_position=7, cluster=0), Point(x_position=4, y_position=5, cluster=2), Point(x_position=4, y_position=3, cluster=0), Point(x_position=5, y_position=6, cluster=2), Point(x_position=5, y_position=7, cluster=2), Point(x_position=6, y_position=5, cluster=2), Point(x_position=6, y_position=7, cluster=2), Point(x_position=7, y_position=6, cluster=2), Point(x_position=7, y_position=5, cluster=2), Point(x_position=5, y_position=14, cluster=0), Point(x_position=6, y_position=15, cluster=0), Point(x_position=7, y_position=13, cluster=6), Point(x_position=8, y_position=16, cluster=0), Point(x_position=8, y_position=13, cluster=6), Point(x_position=8, y_position=12, cluster=6), Point(x_position=9, y_position=14, cluster=0), Point(x_position=11, y_position=13, cluster=0), Point(x_position=12, y_position=15, cluster=0), Point(x_position=13, y_position=15, cluster=11), Point(x_position=13, y_position=13, cluster=11), Point(x_position=17, y_position=7, cluster=12), Point(x_positi

In [213]:
#done 

## My Work for DBSCAN Below

In [None]:
#def DBSCAN(points, epsilon, MinPts)                   
    
    
#final options. 
#     core = 1  
#     not_core = 0
#     N = []
    
#     for pt in points:
#         if pt.cluster == None:
#             N.append(pt)
#             while N: #breadth search 
#                 node = N.pop(0)  #need to pop out the first in index. 
#                 # print(node)
#                 if node.cluster == None: #which they are all starting as none, so all of them. 
#                     neighs = find_neighbors(node, points, epsilon)  #this will output the neighs list. 
#                     #need to meet our criteria to be a core point 
#                     if len(neighs) >= MinPts:
#                         node.cluster = core #then it is a core point.  
#                         for neigh in neighs: #take that core point's neighbors and see if they are core points as well. 
#                             neighs2 = find_neighbors(neigh, points, epsilon)
#                             if len(neighs2) >= MinPts:
#                                 N.append(neigh)
#                                 #one point frm master list, and then testing a subset, now work with subset, 
#                                 node.cluster = core
#                             else:
#                                 node.cluster = core #these points even though are not core points, they are still in this cluster. 
#                     else: #if 
#                         node.cluster = not_core  #not a core point and not in the cluster/could be outlier. 
#                 else: #if it's not a node period. 
#                     pass 
#             core = core + 1
#         else:
#             pass 
        
#     return points
    
    

#option 1:
#     cluster = 1 
#     not_core = 0 
#     all_neighbors = []
    
#     for point in points:
#         neighbors1 = find_neighbors(point, points, epsilon)
#         all_neighbors.append(neighbors1) 
#         for index, lst in enumerate(all_neighbors):
#             if len(all_neighbors[index]) >= MinPts:
#                 points[index].cluster = cluster
#                 for index2, neigh_point in enumerate(
    
    
#option 2:
#     cluster = 1 
#     not_core = 0 
    
#     for point in points:
#         neighbor_list1 = find_neighbors(point, points, epsilon)
#         if len(neighbor_list1) >= MinPts:
#             point.cluster = cluster 
            
#             for neigh_point in neighbor_list1:
#                 neighbor_list2 = find_neighbors(neigh_point, points, epsilon)
#                 if len(neighbor_list2) >= MinPts:
#                     neigh_point.cluster = cluster 
#                 else:
#                     neigh_point.cluster = not_core 
#             cluster += cluster 
#         else:
#             point.cluster = not_core 
            
    # return points 
    
    
#option 3:
#     for pt in points:
#         if pt.cluster == None:
#             N.append(pt)
#             while N: #breadth search 
#                 node = N.pop()  #need to pop out the first in index. 
#                 # print(node)
#                 if node.cluster == None:
#                     neighs = find_neighbors(node, points, epsilon) 
#                     if len(neighs) >= MinPts:
#                         node.cluster = CN 
#                         for neigh in neighs:
#                             N.append(neigh)
#                     else:
#                         node.cluster = 0
#                 else:
#                     pass 
#             CN = CN + 1
#         else:
#             pass 
        
#     return points

    
#option 4:    
#     for index, point in enumerate(points):
#         neighbor_list1 = find_neighbors(points[index], points, epsilon) #refer to neighbors list  
#         if len(neighbor_list1) >= MinPts:
#             #this point is a core point. 
#             points[index].cluster = cluster    
#             #now looking at the core point's neighbors. 
#             for index2, neigh_point in enumerate(neighbor_list1):
#                 neighbor_list2 = find_neighbors(neighbor_list1[index2], points, epsilon)
#                 #so this neighboring point is a core too! 
#                 if len(neighbor_list2) >= MinPts:
#                     point.cluster = cluster 
#                 #for those that don't have any neighbors. 
#                 else:
#                     point.cluster = not_core       
#                 # if len(neighbor_list2) < MinPts:
#                 #     neigh_point.cluster = not_core 
#             cluster += 1
#         else:
#             point.cluster = not_core 
#         #need to update cluster now
        
#     return points 

In [66]:
demo_points_x = [3,4,4,5,5,6,6,7,7,5,6,7,8,8,8,9,11,12,13,13,17,17,18,18,18,19,21]
print(len(demo_points_x))
random_point = np.random.choice(demo_points_x, 1)
print(random_point)
demo_points_x.remove(random_point)
print(demo_points_x)
print(len(demo_points_x))

27
[6]
[3, 4, 4, 5, 5, 6, 7, 7, 5, 6, 7, 8, 8, 8, 9, 11, 12, 13, 13, 17, 17, 18, 18, 18, 19, 21]
26


In [86]:
test_list = [1,2,3,4,5]
test_list2 = [9,8,7,6,5]
final = []

for i in test_list:
    final.append(i * 2)
    print(final)

print(final)
final2 = zip(final, test_list2)
list(final2)

[2]
[2, 4]
[2, 4, 6]
[2, 4, 6, 8]
[2, 4, 6, 8, 10]
[2, 4, 6, 8, 10]


[(2, 9), (4, 8), (6, 7), (8, 6), (10, 5)]

In [56]:
def multiply(num1, num2):
    product = num1 * num2
    
    return product

In [57]:
def test(num, num_list, threshold):
    result_list = []
    for index in range(len(num_list)):
        result_temp = multiply(num, num_list[index])
        if result_temp <= threshold:
            result_list.append(num_list[index]) 
            
    return result_list
        

In [38]:
multiply(5,20)

100

In [48]:
temp_list = [1,2,3,4,5]       

In [45]:
test(5, temp_list, 9)

[1]

In [77]:
test_list = [1,2,3,4,5]

final = test_list
final
final.index(5)

4

In [60]:
#testing the function below here. 

def find_neighbors(point, points, epsilon):
        ''' Finds all neighboring points within euclidean distance epsilon of point
        
        Args:
            point (Point): point for which we are estimating distance  starting point 
            points (list of Point): list of points for clustering  every other point 
            epsilon (float) : distance to allow assignment to current cluster   epsilon is a variable (distance) 
            
        Returns:
            neighbors (list of Point): list of all points within epsilon of point
        '''
        
        #initialize neighbors list first 
        neighbors = []
        #get my points here. We already have 
        for index in range(len(points)): #need to go through points list for each point. 
            distance_temp = multiply(point, points[index])
            if distance_temp <= epsilon:  #epsilon acts as the threshold or radius.  
                neighbors.append(points[index]) 
            
        #need to take your point and then for loop through the points list and compare that point with your point 
        #euclidean_distance 
        #only finding the neighbors for that specific point. 
        #<= epsilon 
        return neighbors  


In [189]:
find_neighbors(2, temp_list, 1)

[]

In [None]:
temp_list = [1,2,3,4,5]   

In [34]:
np.random.choice(test_list, 1)

array([3])

In [15]:
test_list = [1,3,4,10]
node = test_list.pop(0)
node

1

In [194]:
print(test_list)

[1, 1]
