In [1]:
import numpy as np
import matplotlib.pyplot  as plt
%matplotlib inline

## Basic File Input Pattern

In [2]:
myfile = open('float_data.txt')
data = []
for line in myfile:
    numbers = line.split()
    nums_list = [float(x) for x in numbers]
    data.append(nums_list)

    
data = np.array(data)
print data
myfile.close()

[[ 1.  2.  3.  4.]
 [ 5.  6.  7.  8.]]


## Numpy way of data from loading text

In [3]:
arr = np.loadtxt('float_data.txt')
print arr
print arr[:,[0,1,3]]

[[ 1.  2.  3.  4.]
 [ 5.  6.  7.  8.]]
[[ 1.  2.  4.]
 [ 5.  6.  8.]]


## Numpy way of loading complicated text

Have a look at the text files

In [4]:
arr = np.loadtxt('float_data_with_header.txt',skiprows=1)

## Exercise: 

Load the data from complex_data_file.txt and ignore the comments, and the 4th column

tip: execute the following for the documentation np.loadtxt?

the output should be 

array([

[   1,    1, 2000,   30],

[   2,    1, 2000,   41],

[   4,    1, 2000,   55],

[   5,    1, 2000,   78],

[   6,    1, 2000,  134],

[   7,    1, 2000,   42]])

In [73]:
np.loadtxt?

In [5]:
np.savetxt('mysaveddata.txt',arr)

#### Tips:
- Use numpy loading methods when the file has consistent types, and go for pandas methods if the file contains a mixture of data type.
- Use H5py library for loading and saving large files, it's more efficient than csv and txt.

<img src="imgs/fileformats.PNG" height="600" width = "800" >

## Exercise

for the following image, write a single line to slice each of the marked colored regions
<img src="imgs/Ex1.PNG" height="400" width = "400" >

In [6]:
## how to create this array ?

## try to use broadcasting to create the above array

## slice the green 

## slice the red

## slice the blue

## slice the orange

### Numpy fancy indexing

The goal of the following is to slice something that is irregular
<img src="imgs/im2.PNG" height="200" width = "400" >

In [76]:
a = np.arange(0,80,10)

In [77]:
## indexing by indices
indices = [1,2,-3]
y = a[indices]


## indexing by masks
mask1= a < 30
mask2= a > 0
mask3 = a == 50
mask = (mask & mask2) |mask3
mask
z = a[mask]
print y,z

[10 20 50] [10 20 50]


In [101]:
## import the classids.txt file as a dictionary in one line
fi  = open('classids.txt').readlines()
classids_dict = dict(line.strip().split(' ') for line in open('classids.txt').readlines() )
classids_dict

{'1': 'dog',
 '2': 'cat',
 '3': 'car',
 '4': 'motorbike',
 '5': 'man',
 '6': 'child',
 '7': 'door',
 '8': 'window'}

In [103]:
import glob
fnames=  list(glob.iglob('E:\\wheelchair\\bicycle\\*.jpg'))

print fnames[:5]

['E:\\wheelchair\\bicycle\\bicycle0000.jpg', 'E:\\wheelchair\\bicycle\\bicycle0001.jpg', 'E:\\wheelchair\\bicycle\\bicycle0002.jpg', 'E:\\wheelchair\\bicycle\\bicycle0003.jpg', 'E:\\wheelchair\\bicycle\\bicycle0004.jpg']


### Exercise on Selection and Slicing

In [None]:
# Copyright 2016 Enthought, Inc. All Rights Reserved
"""
Dow Selection
-------------

Topics: Boolean array operators, sum function, where function, plotting.

The array 'dow' is a 2-D array with each row holding the
daily performance of the Dow Jones Industrial Average from the
beginning of 2008 (dates have been removed for exercise simplicity).
The array has the following structure::

       OPEN      HIGH      LOW       CLOSE     VOLUME      ADJ_CLOSE
       13261.82  13338.23  12969.42  13043.96  3452650000  13043.96
       13044.12  13197.43  12968.44  13056.72  3429500000  13056.72
       13046.56  13049.65  12740.51  12800.18  4166000000  12800.18
       12801.15  12984.95  12640.44  12827.49  4221260000  12827.49
       12820.9   12998.11  12511.03  12589.07  4705390000  12589.07
       12590.21  12814.97  12431.53  12735.31  5351030000  12735.31

0. The data has been loaded from a .csv file for you.
1. Create a "mask" array that indicates which rows have a volume
   greater than 5.5 billion.
2. How many are there?  (hint: use sum).
3. Find the index of every row (or day) where the volume is greater
   than 5.5 billion. hint: look at the where() command.

Bonus
~~~~~

1. Plot the adjusted close for *every* day in 2008.
2. Now over-plot this plot with a 'red dot' marker for every
   day where the volume was greater than 5.5 billion.

See :ref:`dow-selection-solution`.
"""

from numpy import loadtxt, sum, where
from matplotlib.pyplot import figure, hold, plot, show

# Constants that indicate what data is held in each column of
# the 'dow' array.
OPEN = 0
HIGH = 1
LOW = 2
CLOSE = 3
VOLUME = 4
ADJ_CLOSE = 5

# 0. The data has been loaded from a .csv file for you.

# 'dow' is our NumPy array that we will manipulate.
dow = loadtxt('dow.csv', delimiter=',')

# 1. Create a "mask" array that indicates which rows have a volume
#    greater than 5.5 billion.


# 2. How many are there?  (hint: use sum).

# 3. Find the index of every row (or day) where the volume is greater
#    than 5.5 billion. hint: look at the where() command.

# BONUS:
# a. Plot the adjusted close for EVERY day in 2008.
# b. Now over-plot this plot with a 'red dot' marker for every
#    day where the volume was greater than 5.5 billion.

## Exercise on Numpy Functions

In [None]:
# Copyright 2016 Enthought, Inc. All Rights Reserved
"""
Wind Statistics
----------------

Topics: Using array methods over different axes, fancy indexing.

1. The data in 'wind.data' has the following format::

        61  1  1 15.04 14.96 13.17  9.29 13.96  9.87 13.67 10.25 10.83 12.58 18.50 15.04
        61  1  2 14.71 16.88 10.83  6.50 12.62  7.67 11.50 10.04  9.79  9.67 17.54 13.83
        61  1  3 18.50 16.88 12.33 10.13 11.17  6.17 11.25  8.04  8.50  7.67 12.75 12.71

   The first three columns are year, month and day.  The
   remaining 12 columns are average windspeeds in knots at 12
   locations in Ireland on that day.

   Use the 'loadtxt' function from numpy to read the data into
   an array.

2. Calculate the min, max and mean windspeeds and standard deviation of the
   windspeeds over all the locations and all the times (a single set of numbers
   for the entire dataset).

3. Calculate the min, max and mean windspeeds and standard deviations of the
   windspeeds at each location over all the days (a different set of numbers
   for each location)

4. Calculate the min, max and mean windspeed and standard deviations of the
   windspeeds across all the locations at each day (a different set of numbers
   for each day)

5. Find the location which has the greatest windspeed on each day (an integer
   column number for each day).

6. Find the year, month and day on which the greatest windspeed was recorded.

7. Find the average windspeed in January for each location.

You should be able to perform all of these operations without using a for
loop or other looping construct.

Bonus
~~~~~

1. Calculate the mean windspeed for each month in the dataset.  Treat
   January 1961 and January 1962 as *different* months. (hint: first find a
   way to create an identifier unique for each month. The second step might
   require a for loop.)

2. Calculate the min, max and mean windspeeds and standard deviations of the
   windspeeds across all locations for each week (assume that the first week
   starts on January 1 1961) for the first 52 weeks. This can be done without
   any for loop.

Bonus Bonus
~~~~~~~~~~~

Calculate the mean windspeed for each month without using a for loop.
(Hint: look at `searchsorted` and `add.reduceat`.)

Notes
~~~~~

These data were analyzed in detail in the following article:

   Haslett, J. and Raftery, A. E. (1989). Space-time Modelling with
   Long-memory Dependence: Assessing Ireland's Wind Power Resource
   (with Discussion). Applied Statistics 38, 1-50.


See :ref:`wind-statistics-solution`.
"""

from numpy import loadtxt
