<img src="../support_files/cropped-SummerWorkshop_Header.png">  

<h1 align="center">Python Bootcamp</h1> 
<h3 align="center">August 21-22, 2021</h3> 

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<center><h1>Importing Modules</h1></center>

<p>When you first start up a Python interpreter (or a new Jupyter notebook), there is only a limited set of commands and objects available for you to work with. To extend this functionality, there is a vast collection of tools that extend the functionality of Python, but that must be explicitly <i>imported</i> to make them available.

<p>This is in contrast to tools like MATLAB, Igor, and R, which implicitly provide access to a larger array of functionality. This is a reflection of the fact that Python is a general-purpose language; whereas MATLAB assumes the user is interested in linear algebra, Python tries to make no assumptions about what functionality we will need.

</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
    <h2>Libraries</h2>

<p>Python itself comes with a large number of handy standard libraries that can be imported and used:  

<a href=https://docs.python.org/3.7/library/>Python Standard Library</a>
    
<p>However, the real power of Python for <i>scientific computing</i> comes from the wealth of open-source, community driven packages that are available. Many of these popular external packages com bundled with the Anaconda Python distribution. Here's a full list of the packages that come with Anaconda: 
    
<a href=https://docs.continuum.io/anaconda/pkg-docs>Anaconda package lists</a>
    
<p>To help prepare you for the Summer Workshop on the Dynamic Brain, we are going to focus on four commonly used and extremely powerful scientific computing libraries:
<ul>
<li>Numpy - Matrix manipulation and optimized mathematical functions (<a href=http://www.numpy.org/>http://www.numpy.org</a>)
<li>Scipy - A suite of powerful scientific modules that extends on Numpy's functionality (<a href=http://docs.scipy.org/doc/scipy/reference/>http://docs.scipy.org/doc/scipy/reference/</a>)
<li>Matplotlib - A python 2D plotting library that produces publication quality figures (<a href=http://matplotlib.org/>http://matplotlib.org/</a>)
<li>Pandas - A high-performance library for data manipulation and analysis (<a href=http://pandas.pydata.org/>http://pandas.pydata.org/</a>)
</ul>


</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h2>Packages and Modules</h2>

<p>Before we get into the details of these packages, we need some definitions:

<p><b>Module:</b> 
<ol>
<li>A file that contains the code defining a set of python functions, classes, and data.
<li>A type of Python object that contains a set of python functions, classes, and data. Modules are loaded by Python using the <code>import</code> statement, and the functionality contained in the module is accessed as attributes of the module object.
</ol>

<p><b>Package:</b>
<ol>
<li>A special type of module (object) that contains submodules and/or subpackages, providing a hierarchical organization for modules. Packages, just like modules, are loaded using the <code>import</code> statement.
<li>A container of one or more related Python modules that can be installed together (for example, numpy is a package containing many submodules. It can be downloaded, installed, and imported.) The name "library" is sometimes used interchangeably with package.
</ol>
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

<p>Let's start by importing numpy. This is one of the most important libraries in the scientific Python stack; we will explore it in detail in the next section.

</div>

In [1]:
import numpy

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p>This statement did two things:
<ol>
<li>Created a new module object by loading the "numpy" module
<li>Assigned the name "numpy" to this new object
</ol>

</div>

In [2]:
numpy

<module 'numpy' from '/home/luke/.local/lib/python3.6/site-packages/numpy/__init__.py'>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p>The functionality contained within this module are accessed as <i>attributes</i>:

</div>

In [3]:
numpy.pi

3.141592653589793

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

<p>Numpy is a large library. Let's use the <code>dir</code> function, which you learned earlier, to view all of the available methods.

</div>

In [4]:
print(dir(numpy))



<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p>We can also use the <code>dir</code> command without an argument to see what's in our notebook's global 'namespace':

</div>

In [5]:
print(dir())

['In', 'Out', '_', '_2', '_3', '__', '___', '__builtin__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', '_dh', '_i', '_i1', '_i2', '_i3', '_i4', '_i5', '_ih', '_ii', '_iii', '_oh', 'exit', 'get_ipython', 'numpy', 'quit']


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p>Most of these are behind-the-scenes variables that your kernel is using to keep track of command history or other low-level attributes.

<p>At this point, the only variable we've actually declared is <code>numpy</code>

</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p><b>Importing specific functions from a module</b>
<p>Another importing option is to import the specific functions that you need from a libary

</div>

In [6]:
from numpy import cos,pi

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p>Note that, if we now look at the global namespace using <code>dir</code>, the two names that we've just imported are now visible, which means that they're globally accessible.

</div>

In [7]:
print(dir())

['In', 'Out', '_', '_2', '_3', '__', '___', '__builtin__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', '_dh', '_i', '_i1', '_i2', '_i3', '_i4', '_i5', '_i6', '_i7', '_ih', '_ii', '_iii', '_oh', 'cos', 'exit', 'get_ipython', 'numpy', 'pi', 'quit']


In [8]:
cos(pi)

-1.0

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p><b>Importing all attributes into the global namespace - A word of caution:</b>
<p>It is possible to import all objects from a module:
<pre>
    from numpy import *
</pre>

<p>THIS IS GENERALLY DISCOURAGED. It makes a mess of your namespace, it makes it difficult to tell where any given name was defined, and it increases the likelihood of name conflicts.  In addition, against intuitive interpretation, it does not "import everything".  The behavior of this command is defined by the implementation of the module, so it can be difficult to predict what you will get.

</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p><b>This is a common way to import:</b>
<p>Packages can be renamed on import and standard shorthand notation has developed for various packages. For instance, Numpy is commonly imported as <code>np</code>. Following this convention is not required, but doing so makes it makes it easier to share code.
</div>

In [9]:
import numpy as np

In [10]:
print(np.sqrt(np.pi))

1.7724538509055159


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h2>Installing new modules</h2>

<p>Python comes with a large set of useful packages pre-installed, and Anaconda builds on those with an even larger set of computational tools. But every once in a while, we may need to install something new.

<p>The most common place for developers to publish new packages is on the <a href=http://pypi.python.org>Python Package Index</a>, and the easiest way to install these packages is with the <code>pip</code> command. 

<p>Open a command prompt:
<ul>
<li>On Windows, ...
<li>On macOS, ...
 <li>on Linux, ...
</ul>


<div style='color: #CCC; background-color: #000; font-family: monospace; white-space: pre; padding: 15px; margin: 10px; margin-left: 30px'><span style="color: #8DA; font-weight: bold;">$ </span><span style="color: #FFF; font-weight: bold;">pip install git+https://github.com/alleninstitute/allensdk@swdb_2019</span>

Collecting allensdk
  Downloading allensdk-0.13.2.tar.bz2 (377kB)
    100% |████████████████████████████████| 378kB 1.5MB/s 
. . .
Successfully installed allensdk-0.13.2
</div>

Under normal circumstances we would have used <code>pip install allensdk</code>.  For this workshop we will be using a prelease version of <code>allensdk</code>, which requires that we execute the more complicated command above.

We have just installed a set of tools published by the Allen Institute that will allow us to access its released data. The <code>pip</code> command automatically downloads and installs the package from pypi.python.org, as well as any other packages that it depends on.

Let's make sure the package is importable:
<div> <!-- NOTE: this div is a workaround for a jupyter HTML export bug --> </div>
</div>

In [11]:
import allensdk
allensdk

<module 'allensdk' from '/home/luke/anaconda3/lib/python3.6/site-packages/allensdk/__init__.py'>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h2>Module import paths</h2>

<p>When you imported numpy and allensdk above, you didn't need to specify the location of the module files. How did Python know where to find them?

<p>Python keeps a list of locations on your filesystem where installed packages may reside. When we ask Python to import a new module, it checks each location in the list for either a module or a package with the name you specified.

<p>We can access this list from the <code>sys</code> module, which is part of the Python standard library:
</div>

In [12]:
import sys
sys.path

['',
 '/home/luke/anaconda3/lib/python36.zip',
 '/home/luke/anaconda3/lib/python3.6',
 '/home/luke/anaconda3/lib/python3.6/lib-dynload',
 '/home/luke/.local/lib/python3.6/site-packages',
 '/home/luke/anaconda3/lib/python3.6/site-packages',
 '/home/luke/pyqtgraph',
 '/home/luke/anaconda3/lib/python3.6/site-packages/IPython/extensions',
 '/home/luke/.ipython']

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p>When we install new packages, their files must be added to one of the entries in <code>sys.path</code>; otherwise, they will not be importable. (We could also modify <code>sys.path</code> to include the new location of the package, but this is generally discouraged.)

<p>Let's see what happens when we try to import a module that cannot be found:
</div>

In [13]:
import not_a_real_module

ModuleNotFoundError: No module named 'not_a_real_module'

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p>The error above is a common source of frustration for Python beginners, and it happens most frequently when there are multiple Python environments installed on a machine. When we install new packages, it is important to be aware of <i>which environment</i> the package is being installed into. Find the actual installed location of the module files, and verify that these are indeed in your <code>sys.path</code>. 

</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p><b>Exercise 3.1:</b>
<p>Find the location of the allensdk module on your disk and verify that it is in <code>sys.path</code>.
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h2>Using the OS module for dealing with paths, filenames and other miscellaneous operating system functions</h2>

<p>Directories (folders) and files on your hard drives(s) are organized in a tree structure. Nodes are directories and files, while edges indicate containment. As you navigate from directory to directory, you trace out a path through this tree. Paths can be represented as strings like this in Linux and Mac:

<p>&nbsp;&nbsp;&nbsp;&nbsp;<code>path = "/folder_a/folder_b/file.name"</code>

<p>And like this on Windows:

<p>&nbsp;&nbsp;&nbsp;&nbsp;<code>path = "C:\\folder_a\\folder_b\\file.name"</code>

<p>We use these path strings when instructing Python to read and write files.

<p>Dealing with paths will be substantially easier if you first import the <code>os</code> module:

</div>

In [14]:
import os

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p><b>Exercise 3.2:</b>
<p><code>os.path.join</code> is a simple function that joins together multiple folder names using either a forward slash <code>/</code> or backslash <code>\</code> depending on the operating system. 

<p>&nbsp;&nbsp;&nbsp;&nbsp;<code>os.path.join('Users', 'my_username', 'my_data_folder')</code>

<p>This is a convenient way to generate path names in code that you want to be portable.

<p>Replace the names above with real folder names to a path on your system and see what is output:

</div>

In [15]:
name_1='users'
name_2='me'
name_3='desktop'
print(os.path.join(name_1, name_2, name_3))

users/me/desktop


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p>Another couple of handy functions are <code>getcwd</code> and <code>listdir</code>
</div>

In [16]:
print("current directory:")
print(os.getcwd())
print('\n')
print("files in current directory:")
os.listdir(os.getcwd())

current directory:
/home/luke/SWDB_2018/PythonBootcamp/solutions


files in current directory:


['05_Custom_Modules_and_Version_Control_solutions.html',
 '01_Basic_Python_I_Object_and_Data_Structures_solutions.html',
 '.ipynb_checkpoints',
 '06_Introduction_To_Matplotlib_solutions.ipynb',
 '09_bike_crossing_solutions.ipynb',
 '03_Intro_To_Scientific_Computing_solutions.html',
 '05_Custom_Modules_and_Version_Control_solutions.ipynb',
 '01_Basic_Python_I_Object_and_Data_Structures_solutions.ipynb',
 '04_Introduction_To_Numpy_solutions.html',
 '04_Introduction_To_Numpy_solutions.ipynb',
 '02_Basic_Python_II_Control_Flow_and_Functions_solutions.html',
 '09_bike_crossing_solutions.html',
 '11_Image_data_solutions.html',
 '02_Basic_Python_II_Control_Flow_and_Functions_solutions.ipynb',
 '06_Introduction_To_Matplotlib_solutions.html',
 '11_Image_data_solutions.ipynb',
 '07_Introduction_To_Pandas_solutions.ipynb',
 '07_Introduction_To_Pandas_solutions.html',
 '03_Intro_To_Scientific_Computing_solutions.ipynb']

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p>To find out if a path actually exists on your drive, use:
<p>&nbsp;&nbsp;&nbsp;&nbsp;<code>os.path.exists(path)</code>
</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p><b>Exercise 3.3:</b>
<p>Create a folder called <code>TempFolder</code> somewhere on your local drive
<ol>
<li>Generate the full path using the <code>os.path.join</code> function
<li>Use the <code>os.path.exists</code> function to ensure that the folder doesn't already exist (confirm that it returns False)
<li>Create the directory using <code>os.mkdir</code>
<li>Confirm that it now exists using the <code>os.path.exists</code> function again
<li>Navigate to the folder using your file explorer to confirm that it's there
</ol>
</div>

In [17]:
new_path = os.path.join('..', 'support_files', 'TempFolder')

print(os.path.exists(new_path))

os.mkdir(new_path)

print(os.path.exists(new_path))

False
True


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<left><h1>A Brief Introduction to Scipy</h1></left>
<p>The Scipy library has a lot of useful functions for signal processing, linear algebra, statistical testing, etc. Given its breadth, and its overlap with Numpy, we've elected not to cover it in detail here.

</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

<p><b>The Scipy library has ~20 subpackages</b>
<p>each subpackage contains a lot of useful functions<p>

<ol>
<li><b>Clustering package</b> (scipy.cluster)
<li>Constants (scipy.constants)
<li><b>Discrete Fourier transforms</b> (scipy.fftpack)
<li>Integration and ODEs (scipy.integrate)
<li>Interpolation (scipy.interpolate)
<li>Input and output (scipy.io)
<li><b>Linear algebra</b> (scipy.linalg)
<li>Miscellaneous routines (scipy.misc)
<li><b>Multi-dimensional image processing</b> (scipy.ndimage)
<li>Orthogonal distance regression (scipy.odr)
<li>Optimization and root finding (scipy.optimize)
<li><b>Signal processing</b> (scipy.signal)
<li>Sparse matrices (scipy.sparse)
<li>Sparse linear algebra (scipy.sparse.linalg)
<li>Compressed Sparse Graph Routines (scipy.sparse.csgraph)
<li>Spatial algorithms and data structures (scipy.spatial)
<li>Special functions (scipy.special)
<li><b>Statistical functions</b> (scipy.stats)
<li>Statistical functions for masked arrays (scipy.stats.mstats)
<li>C/C++ integration (scipy.weave)
</ol>
</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p><b>Exercise 3.4:</b>
<p>Spend a few minutes exploring the scipy documnention here:
<p><a href=http://docs.scipy.org/doc/scipy/reference/>scipy reference</a>


</div>