<a href="https://colab.research.google.com/github/TokyoGoblin/Algorithm-Development/blob/main/StructuringData.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [32]:
import pandas as pd

df = pd.DataFrame({'A': [ 0,0,0,0,0,1,0],
                 'B': [0,2,3,5,0,2,0,],
                 'C': [0,3,4,1,0,2,0]})
print(df, "\n")
df = df.drop_duplicates()
print(df)


   A  B  C
0  0  0  0
1  0  2  3
2  0  3  4
3  0  5  1
4  0  0  0
5  1  2  2
6  0  0  0 

   A  B  C
0  0  0  0
1  0  2  3
2  0  3  4
3  0  5  1
5  1  2  2


In [33]:
import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [0,0,1,None],
                   'B': [1,2,3,4],
                   'C': [np.NAN, 3,4,1]},
                  dtype=int)
print(df, "\n")

values = pd.Series(df.mean(), dtype=int)
print(values, "\n")

df=df.fillna(values)
print(df)

#The fillna() function enables you to get rid of the
#missing values whether they're not a number or simply missing



      A  B    C
0     0  1  NaN
1     0  2    3
2     1  3    4
3  None  4    1 

A    0
B    2
C    2
dtype: int64 

   A  B  C
0  0  1  2
1  0  2  3
2  1  3  4
3  0  4  1


  df = pd.DataFrame({'A': [0,0,1,None],


# Stacking and Piling Data in Order

In [37]:
#A stack provides LIFO data storage. The NumPy package provides an actual stack implementation.
#Additionally, Pandas associate stack with objects such as the DataFrame.

MyStack = []
StackSize = 3

def DisplayStack():
  print("Stack currently contains:")
  for item in MyStack:
    print(item)

def Push(Value):
  if len(MyStack) < StackSize:
    MyStack.append(Value)
  else:
    print("Stack is full!")

def Pop():
    if len(MyStack) > 0:
      print("Popping:" , MyStack.pop())
    else:
        print("Stack is empty.")

Push(1)
Push(2)
Push(3)
DisplayStack()
Push(4)
Pop()
DisplayStack()
Pop()
Pop()
Pop()

Stack currently contains:
1
2
3
Stack is full!
Popping: 3
Stack currently contains:
1
2
Popping: 2
Popping: 1
Stack is empty.


*Using queues**

In [39]:
#Unlike stacks, queues are FIFO data structures

import queue

MyQueue = queue.Queue(3)

print("Queue empty: ", MyQueue.empty())

MyQueue.put(1)
MyQueue.put(2)
MyQueue.put(3)
print("Queue full:" , MyQueue.full())

print("Popping: " , MyQueue.get())
print("Queue full: ", MyQueue.full())

print("Popping: ", MyQueue.get())
print("Popping: ", MyQueue.get())
print("Queue empty: ", MyQueue.empty())

Queue empty:  True
Queue full: True
Popping:  1
Queue full:  False
Popping:  2
Popping:  3
Queue empty:  True


**Finding data using dictionaries**

*Special rules for making a key*

1) The key must be unique.
2) The key must be immutable.


In [73]:
Colors = {"Sam": "Blue", "Amy": "Red", "Sarah": "Yellow"}

print(Colors["Sarah"])
print(Colors.keys())



for name, color in Colors.items():
  print(f"{name} likes the color {color}.")

Colors["Sarah"] = "Purple"

Colors.update({"Harry": "Orange"})

del Colors["Sam"]


print(Colors)

Yellow
dict_keys(['Sam', 'Amy', 'Sarah'])
Sam likes the color Blue.
Amy likes the color Red.
Sarah likes the color Yellow.
{'Amy': 'Red', 'Sarah': 'Purple', 'Harry': 'Orange'}


**Back to the TREES!!**

A tree strucutre looks like the physical object in the natural world. Using trees helps you organize data quickly and find it in a shorter time than using
many other data-storage techniques. Commonly used for search and sort routines.

A root node provides the starting point for the varoius kinds of processing one performs. Connected to the root node are either branches or leaves. a leaf node is always an endpoing point for the tree. Branch nodes support either other branches or leaves.

In [82]:
class binaryTree:
  def __init__(self, nodeData, left=None, right=None):
    self.nodeData = nodeData
    self.left = left
    self.right = right

    def __str__(self):
      return str(self.nodeData)

#Creates a basic tree object that defines the three elements that a node must include:
#data storage, left connection, and right connection

tree = binaryTree("Root")
BranchA = binaryTree("Branch A")
BranchB = binaryTree("Branch B")
tree.left = BranchA
tree.right = BranchB

LeafC = binaryTree("Leaf C")
LeafD = binaryTree("Leaf D")
LeafE = binaryTree("Leaf E")
LeafF = binaryTree("Leaf F")
BranchA.left = LeafC
BranchA.right = LeafD
BranchB.left = LeafE
BranchB.right = LeafF

#how to use recursion to traverse the tree we just built
def traverse(tree):
  if tree.left != None:
    traverse(tree.left)
  if tree.right != None:
    traverse(tree.right)
  print(tree.nodeData)

traverse(tree)

Leaf C
Leaf D
Branch A
Leaf E
Leaf F
Branch B
Root


**Types of Trees**

Balanced trees: a kind of tree that maintains a balanced strucutre through reorganization so that it can provide reduced access times

Unbalanced trees:A tree that places new data items wherever necessary in the tree without regard to balance. This method builds the tree faster but reduceds access speed when searching or sorting.

Heaps: A sophisticated tree that allows data insertions into the tree structure. The use of data insertion makes sorting faster.

**Representing Relations in a Graph**

Graphs are another form of common data structure. Used if top-down approach won't work

In [86]:
graph = {'A': ['B', 'F'],
         'B': ['A','C'],
         'C': ['B', 'D'],
         'D' : ['C', 'E'],
         'E' : ['D','F'],
         'F' : ['E','A']}

def find_path(graph, start, end, path=[]):
  path = path + [start]

  if start == end:
    print("Ending")
    return path

  for node in graph[start]:
    print("Checking Node ", node)

    if node not in path:
      print("Path so far ", path)

      newp = find_path(graph, node, end, path)
      if newp:
        return newp

find_path(graph, 'B', 'E')

Checking Node  A
Path so far  ['B']
Checking Node  B
Checking Node  F
Path so far  ['B', 'A']
Checking Node  E
Path so far  ['B', 'A', 'F']
Ending


['B', 'A', 'F', 'E']