# Who's registered twice?
Automation: A proof of concept

<img src="images/iterative_process_diagram_Slide01.jpg" alt="Iterative design process diagram" width="600">


As a growing organization, we take on new members each and every day, making it imperative that our records stay "clean". Having incomplete, incorrectly-input or duplicate records can seriously bloat the costs of managing information within an organization, but the process for developing technologies to counter this problem has never been more accessible than it is today.

<ul>
    <li>If there are incomplete records, members may not be accounted for in some cases.</li>
    <li>If some records contain incompatible formats or incorrect information, processing data is much more cumbersome to handle as a programmer. </li>
    <li>If duplicate entries are not caught as they occur, the costs of storing and using our own information is guarenteed to increase exorbitantly.</li>
    <li>Python is a programming language that is designed to be easily read and intuitive to implement.</li>
</ul>

### Coding useful implementations
<p style="font-size:14px">Python makes it quite easy to use code that we have previously developed, so it's also important to build programs with this in mind! 
    Through simple and modular<sup>1</sup> development, we can apply the same code to many similar problems or build upon previous ideas. Here we are going to import the contents of an entire file as well as a single function from Python programs which I built for this demonstration.</p> 

<br>

<p style="font-size:10px"><a href="https://en.wikipedia.org/wiki/Modular_programming" target="blank_">[1]</a><em> Modular programming approach: a software design technique that emphasizes separating the functionality of a program into independent, interchangeable modules, such that each contains everything necessary to execute only one aspect of the desired functionality.</em></p>

In [1]:
from duplicate_check_functions import duplicate_check
import duplicate_check_gen

These are two Python files I have created to handle our data today, name_check_functions.py and name_check_gen.py


For our data I used a <a href="http://convertcsv.com/generate-test-data.htm" target="_blank">program</a> that creates randomly generated values for our chosen fields or "keys."

Our keys represent the different information that our members may provide at registration (such as names, email addresses, phone numbers, etc).

<br>

<p style="font-size:12px"><a href="https://en.wikipedia.org/wiki/Comma-separated_values" target="_blank">CSV</a><em>: A comma-separated value file. This file format is simple to work with, which is why I have chosen it for this project.</em><p>

In [2]:
# A list of our keys
keys = duplicate_check_gen.reader.fieldnames
print(keys)

['index', 'date', 'name', 'email', 'phone']


Knowing the exact name of our keys is important as words are case-sensitive in Python. Having a list of them makes it easy to use as well.


From name_check_functions.py we are importing a function that I have designed which allows us to open two separate CSV files and to compare their records on our designated key.

In [3]:
duplicate_check(key='name')

Number of duplicates found: 13


Without further prompting, our program will only check for the duplicated records by name and print the quantity of duplicates. However, if we add an additional argument<sup>1</sup> to our function call we can create a new CSV file containing these duplicates, as well as telling our program what we would like its title to be.

<br>

<p style="font-size:12px">[1] Arguments are special inputs that our program is built to be able to work with. If we were to use an invalid argument, Python would not execute the program but would send us an error message calling it an "unexpected keyword argument"</p>

In [4]:
duplicate_check(key='name', title='ORG-ABBV_date')

Number of duplicates found: 13
New file created!


We found the extra records and collected them into a new file! 


<p style="font-size:11px">Note: The creation of a new file is not entirely necessary, however I found it potent to describe how easy a program can be modified to meet our needs.<p>

---

## Conclusion

___

As we can see, there is abundant potential to automate out our tedious office tasks, freeing up our time to support our organization in other ways as well as allowing us to share our efforts with others. There are plenty of platforms which allow us to manage and share our code for free which grants a couple benefits, like community collaboration and total information-processing transparency. (see: <a href="https://github.com" target="_blank">github.com</a>)

---

Python is currently one of the most popular programming languages for automation; however programming is only a form of problem-solving and once techniques are learned, they can be generalized and applied in many programming languages or even for non-computational uses. 

Learning to program fosters an intuitive form of information literacy. Without necessarily studying advanced mathematical concepts we can access raw information and uncover many important insights using processes built on mathematical foundations. This abstraction of mathematics allows us to process information as objects in relation to each other. Meaning that variables are no longer just the x and y we all learned in algebra, these variables are meaningful information and named so.

Computational automation, like many things, does have trade-offs: computers are excellent at doing one thing repetitively, so while there are many tasks we may be able to take our hands off, it is important to ask if we need a computer at all.

# Author

<p style="font-size:18px">Lily Yake</p><p><em>She/Her, They/Them</em></p>

<a href="https://www.github.com/l-yake/" target="_blank">Github Profile</a>
    
<a href="https://www.linkedin.com/in/lily-yake/" target="_blank">Linkedin Profile</a>

<br>

<p style="font-size:12px">Please contact me with any questions or concerns regarding this notebook or related programs.</p>