Permalink
Browse files

initial commit

  • Loading branch information...
Clayton Hunter
Clayton Hunter committed Mar 13, 2018
1 parent f694869 commit 88a5c0ed383863ed1339caec2cf274451f95b960
Showing with 15,344 additions and 0 deletions.
  1. +7 −0 .gitignore
  2. +204 −0 notebooks/session_01/AppliedAnalytics_GitCheatSheet.md
  3. BIN notebooks/session_01/AppliedDataAnalyticsClass1datasetoverview.pdf
  4. +150 −0 notebooks/session_01/merge_conflict.md
  5. BIN notebooks/session_01/phd101212s.gif
  6. +534 −0 notebooks/session_02/Notebook2b_SpatialAnalysis.ipynb
  7. 0 notebooks/session_02/images/Icon
  8. BIN notebooks/session_02/images/map_projections_xkcd.png
  9. BIN notebooks/session_02/images/pgAdmin-connected.png
  10. +207 −0 notebooks/session_03/03-exercise_answers.ipynb
  11. +75 −0 notebooks/session_03/appendix-pandas-CSV-to-database.ipynb
  12. +993 −0 notebooks/session_03/appendix-updating_a_data_table.ipynb
  13. +938 −0 notebooks/session_04/Record_Linkage.ipynb
  14. BIN notebooks/session_04/Regex.png
  15. +1,235 −0 notebooks/session_05/APIs-OpenTripPlanner.ipynb
  16. +851 −0 notebooks/session_05/Data-Visualization.ipynb
  17. +982 −0 notebooks/session_06/Introduction_to_Networks.ipynb
  18. +99 −0 notebooks/session_06/network_foodstamps.sql
  19. +2,067 −0 notebooks/session_07/machine_learning_with_extra_explanations.ipynb
  20. +2,458 −0 notebooks/session_09/Text_Analysis.ipynb
  21. 0 notebooks/session_10/03-images/Icon
  22. BIN notebooks/session_10/03-images/https.png
  23. BIN notebooks/session_10/03-images/inspect-01.png
  24. BIN notebooks/session_10/03-images/inspect.png
  25. BIN notebooks/session_10/03-images/inspected.png
  26. BIN notebooks/session_10/03-images/offices.png
  27. BIN notebooks/session_10/03-images/site.png
  28. BIN notebooks/session_10/03-images/table-code.png
  29. BIN notebooks/session_10/03-images/table.png
  30. BIN notebooks/session_10/03-images/test-website.png
  31. +417 −0 notebooks/session_10/Web_crawling.ipynb
  32. +965 −0 notebooks/session_10/Web_scraping.ipynb
  33. 0 notebooks/session_11/Data Setup/Icon
  34. +349 −0 notebooks/session_11/Data Setup/Inference Data Setup.ipynb
  35. +958 −0 notebooks/session_11/Inference Example.ipynb
  36. +862 −0 notebooks/session_11/machine_learning_recap_ildoc.ipynb
  37. +818 −0 notebooks/session_12/Privacy and Confidentiality Exercises.ipynb
  38. +175 −0 notebooks/session_12/The ADRF Export Process - A Walkthrough.ipynb
View
@@ -0,0 +1,7 @@
_site/
.sass-cache/
.jekyll-metadata
.DS_Store
.ipynb_checkpoints
data/
output/
@@ -0,0 +1,204 @@
# Applied Analytics Git CheatSheet
The git command always is used by first calling git, then telling it what action you want to perform, then passing it additional arguments to tell it how you want it to carry out the requested action:
git <action> <parameters>
Example with action "add", adding "`my_file.txt`":
git add my_file.txt
## Git quickstart - add-commit-pull-push
To check in using git at the command line:
- First, go into the directory of the git repository in which you are working.
- run git status to see what changes have been made:
git status
- Add any files or directories that are new or have been changed:
git add <file_name>
git add <directory_name>
git add README.md
git add *.py # you can use wild cards
- Once you've added all the files, commit.
git commit
- As part of commit, it will ask you to enter a commit message. On Unix and Mac, this will open up your default shell text editor.
- After commit, you sync with the github remote repository.
# first pull, to receive changes that are on the server, not on your computer.
git pull
# then, push your changes to github
git push
When you are collaborating with a team of developers, pulls sometimes force you to manually reconcile changes made to the same bits of code. If it is just you working alone in a repository, however, chances are your pull won't result in any changes or merges. It will just tell you there aren't any changes.
When you push, depending on how you cloned your repository, you will likely have to log in to gitlab.
## Git Concepts
**remote** - a remote is an external repository that the local repository syncs with. A given repository can have more than one remote. Standard remotes:
- origin -- default remote repository (i.e, the GitLab repo if you clone a repository from gitlab)
**branch** - a branch is a set of code changes that are kept separate from the main code base (or trunk) in a git repository. A branch can be worked on in isolation until one wants to merge the changes back into the trunk. Git makes it easy to create branches both in your local repository and in a remote. Standard branches:
- master -- default development branch
- HEAD -- current branch
- HEAD^ -- parent of head
- HEAD~4 -- the great-grandfather of head
## Set up a Git Configuration
```
# Adding some customization
git config --global user.name "Clark Kent"
git config --global user.email "clark.kent@dailyplanet.com"
git config --global color.ui "auto"
git config --global core.editor 'nano' #or vim, emacs sublime
git config --global push.default current
```
## Create a Git Repo
### From an existing repo
```
git clone git://host.org/myproject.git # an external GitHub Repo though HTTPS
git clone ssh://you@somehost.org/project.git # through SSH
git clone ~/some/repo.git ~/new/repo.git # #through the filesystem
```
### From a new project
```
cd ~/myproject
git init # intialize the repo
git add . # add the folder
```
## Stashing - Moving changes to the side
The `git stash` command lets you put aside a set of changes so that you can pull updated code from a remote. You can then either re-apply your stash of changes to the updated code files or discard them.
```
git stash -- save modified and staged changes and them remove them from current branch.
git stash list -- list stack-order of stashed file changes
git stash pop -- worte working from top of stash stack
git stash drop -- discard the changes from top of stash stack
```
## Other Useful Commands
```
git <command> --help #pull up documentation for a <command>
git status -- check which files have been changed in the working directory
git log -- get a history of changes
git checkout <somefile> HEAD --revert to a the state of a file at the last commit
git reset --hard Revert back to the last state WARNING THIS CANNOT BE UNDONE
```
## GitHub Flow
So far we have been doing the "solo" workflow, which looks something
like the following:
```
> mkdir my_working_directory
> cd my_working_directory
> git init
> touch some_file.py
# hack, do some work, hack
# hack
> git add some_file.py
> git commit -m "Working with some awesome idea"
> git push origin master
# hack
# more hacking
```
As you might have guessed, this workflow is just fine when you are
working by yourself. When you're working in a team, it's useful to
have a more structured workflow. Here we'll talk about the Github flow.
In the GitHub flow, *we never code anything unless there is a need to.*
When something needs to be done, we create an **issue** on the GitHub repository
for it. *Good* issues:
- Are clear
- Have a defined output
- Are actionable (written in the imperative voice)
- Can be completed in a few days (at most)
Here are some examples:
- *Good*: /Fix the bug in .../
- *Good*: /Add a method that does .../
- *Bad*: /Solve the project/
- *Bad*: /Some error happen/
[Here is how to create a GitLab issue.](https://docs.gitlab.com/ee/gitlab-basics/create-issue.html)
Once an issue exists, we'll pull from the repo and create a *branch*.
A *branch* is a copy of the code base separate from the main master branch
where we can work on our issue (e.g, fixing a bug, adding a feature) without
affecting the master branch during our work and then ultimately merge our
change into the master branch.
The flow goes something like this:
```
##Pull from the repo
> git pull
##Decide what you want to do and create an issue
> git checkout -b a-meaningful-name
```
The command `git checkout -b` creates a new branch (in this case
called "a-meaningful-name") and switches to that branch. We can see what
branch we are on by using the command `git branch`, which displays all
the branches in the local repository with a `*` next to the branch we are
currently on.
```
##
##hack, hack, hack, make some changes, add/rm files, commit
##
##Push to the repo and create a remote branch
> git push
##Create a pull request and describe your work (Suggest/add a reviewer)
##Someone then reviews your code
##The pull-request is closed and the remote branch is destroyed
##Switch to master locally
> git checkout master
##Pull the most recent changes (including yours)
> git pull
##Delete your local branch
> git branch -d a-meaningful-name
```
[Here is how to create a GitLab pull request.](https://docs.gitlab.com/ee/gitlab-basics/add-merge-request.html)
# Common Scenarios
### When you first start working ...
```
git pull
```
### After you have made some changes to a file, or whenever you finish working...
```
git add <filename>
git commit
git pull
git push
```
### If you try `git pull` and get an error message saying "Your local changes
to the follow files will be overwritten...Please stash or commit your changes"...
```
git stash
git pull
```
Binary file not shown.
@@ -0,0 +1,150 @@
# How to solve a merge conflict in a notebook using nbdime
## What is a conflict (when using git)?
As you collaborate with others on projects using git you will inevitably
run into merge confliicts. A merge conflict is when you and another person
edit the same line of a file. Git will not know which line is the correct one
and create a conflict.
## Make a conflict
If we open up our example notebook `example.ipynb' we can see some very
basic code.
```
%pylab inline
x = list(range(100))
y = np.sin(x)
plt.plot(x,y)
```
This just plots a sin wave.
We are going to create a *branch* of our code. We will talk more about
this in later sessions; this is an easy way to mimic someone else modifying
our code. Our branch is called drama because we are going to
create arbitrary and unnecessary conflict.
```
git checkout -b drama
```
Open up the notebook `example.ipynb` and modify the `sin` function to
a `tan` function. Your code should look like this:
```
%pylab inline
x = list(range(100))
y = np.tan(x)
plt.plot(x,y)
```
Save the notebook and exit out of JupyterHub.
Let's commit our work.
```
git commit -am "changed sin to tan"
```
Let's go back to the main branch, the *master* branch.
```
git checkout master
```
Now we are going to change the same line of code. Just like before go into
JupyterHub and open the `example.ipynb` notebook. Now change the sin to a
cos function so your code looks like this:
```
%pylab inline
x = list(range(100))
y = np.cos(x)
plt.plot(x,y)
```
Save the notebook, exit out of JupyterHub, and commit the results like
before just with a different commit message:
```
git commit -am "changed sin to cos"
```
Now lets merge the `drama` branch wiht the `master` branch.
```
git merge drama
```
We should then have a conflict and see the following output
```
[W nbmergeapp:57] Conflicts occured during merge operation.
[I nbmergeapp:70] Merge result written to .merge_file_4s7ea3
Auto-merging example.ipynb
CONFLICT (content): Merge conflict in example.ipynb
Automatic merge failed; fix conflicts and then commit the result.
```
## Solve a conflict
### Super Quick Way
When we have a coflict it is between your version of the file
and someone else's. If you already know which version of the
file you want to keep, either your version or someone else's,
there is a shortcut.
First figure out which files are conflicting by running:
```
git status
```
This should tell you which files are in conflict. Then
if you want to keep your version:
```
git checkout <name of the file> --ours
git commit -m "fixed conflict by saving my version"
```
If you would like to keep the other person's version:
```
git checkout <name off the file> --theirs
git commit -m "fixed conflict by saving their version"
```
### Using nbdime
To solve this conflict we can use the `nbdime` tool.
```
git mergetool --tool nbdime
```
This should then bring up a graphical interface that shows the line in
question.
Click save the notebook.
On the commandline, you should then see two notebooks.
```
> ls
example.ipynb example.ipynb.orig
```
We can rename `example.ipynb.orig` as `example.ipynb` and open the notebook
in JupyterHub.
```
mv example.ipynb.orig example.ipynb
```
You should see the following in the code cell:
```
%pylab inline
x = list(range(100))
<<<<<<<<< local
y = np.cos(x)
==============
y = np.tan(x)
>>>>>>>>> remote
plt.plot(x,y)
```
The `<<<<<<<` and `>>>>>>>` denote the section of the conflicting code.
`local` means the following line are from the `master` branch while `remote`
shows the preceding line is from the drama branch. The lines of the two
branchs are separated by `=======`. Given a merge conflict we have three
choices: 1) either keep the line from the master branch, keep the line from
the drama branch, or create an entirely new line. In this case we are going
to keep the line from the drama branch. Your code should look like this:
```
%pylab inline
x = list(range(100))
y = np.tan(x)
plt.plot(x,y)
```
Now save the notebook on JupyterHub, commit your work.
```
git commit -am "fixed merge conflict"
```
Now you are fully equipped to solve any merge conflicts that may emerge.
View
Binary file not shown.
Oops, something went wrong.

0 comments on commit 88a5c0e

Please sign in to comment.