# Python concepts: indexing and assigning

## Assigning
When saying copying, it is actually copying the element in a list which itself is also a reference to some more basic object. So copying is actullay make the two variable referencing the same object and referencing is making chained referencing  
**Remember everything in python is object or reference**

### list

`list2 = list1`  
reference the same list, any change in list2 will affect list1  
 
`list2 = list(list1)` or  `list = list1[:]`  
copy the elements of list1, list1 will keep intact when list2 changes  

These two methods, however, have limitations with collections of **mutable** objects as inner objects keep their references intact


In [1]:
a = [[1,2],[3],[4]]

b = a[:]
c = list(a)

c[0].append(9)

print('a: {}'.format(a))
print('b: {}'.format(b))
print('c: {}'.format(c))

a: [[1, 2, 9], [3], [4]]
b: [[1, 2, 9], [3], [4]]
c: [[1, 2, 9], [3], [4]]


For a full copy of a list object, use deep copy

In [2]:
from copy import deepcopy
a = [[1,2],[3],[4]]

b = a[:]
c = deepcopy(a)

c[0].append(9)

print('a: {}'.format(a))
print('b: {}'.format(b))
print('c: {}'.format(c))


a: [[1, 2], [3], [4]]
b: [[1, 2], [3], [4]]
c: [[1, 2, 9], [3], [4]]


In [3]:
a = [[1,2],[3],[4]]
a[0][0] = 'k'
print(a)

[['k', 2], [3], [4]]


## Indexing

### List  
indexing in a list is similar as in C, but multiple indexing is supported. List indices must be **integers** or **slices**, not *tuple* or *list*


In [4]:
a = [[1,2],[3],[4]]

print(a[0][0])
print(a[0:2])
# print(a[0:2, :])  -- WRONG 

# print(a[0,0]) -- WRONG tuple  -- x[(exp1, exp2, ..., expN)] is equivalent to x[exp1, exp2, ..., expN]
# print(a[[0,1]]) -- WRONG list

1
[[1, 2], [3]]


Be careful when indexing using slices: there is no 'array index out of bound' like warnings when the slices are bigger than the array indexing space

In [5]:
print(a[0:100])

[[1, 2], [3], [4]]


### Numpy array

ndarrays can be indexed using x[obj] syntax, where x is the array and obj the selection.

- Field access (x['field-name'] used in structured array, not talked about in this summary)
- basic slicing: slicing obj start:stop:step    -- views of original array
- advanced indexing:  
    -Integer array indexing  
    -Boolean array indexing

In [6]:
import numpy as np
x = np.array([[0, 1, 2],[ 3, 4], 5, 6, 7, 8, 9])
print(x[1::2])

# negative i is interpreted as n+i where n is the number of elements
print(x[-2:10])

print(x[-3:3:-1])


[[3, 4] 6 8]
[8 9]
[7]


Use of ellipsis and newaxis object:  
ellipsis expand to the number of :objects needed to make a selection tuple of the same length as x.ndim
newaxis expand the dim to the returned place

And the difference of slicing and single integer indexing

In [7]:
 x = np.array([[[1],[2],[3]], [[4],[5],[6]]])
print(x.shape)
print(x)
print('----------')

# : is assumed for non sliced dim
print('Be extrelly careful with dim diff between slicing and integer indexing')
print(x[1:]) # equivalent to x[1:,]
print()
print(x[1])
print()

# an ellipsis object, can only be added once in indexing
print('\n ellipsis')
print(x[...,0]) # equilvant to
print(x[:,:, 0])


# add newaxis
print('\n add a new axis:')
print(x[:,np.newaxis,:,:])
print(x[:,np.newaxis,:,:].shape)

# repeated application of slicing using a single non-: entries

print('repeated')
print(x[1][:]) 
print(x[1, :]) # equivalent
print(x[1:, :])# NOT equivalent

(2, 3, 1)
[[[1]
  [2]
  [3]]

 [[4]
  [5]
  [6]]]
----------
Be extrelly careful with dim diff between slicing and integer indexing
[[[4]
  [5]
  [6]]]

[[4]
 [5]
 [6]]


 ellipsis
[[1 2 3]
 [4 5 6]]
[[1 2 3]
 [4 5 6]]

 add a new axis:
[[[[1]
   [2]
   [3]]]


 [[[4]
   [5]
   [6]]]]
(2, 1, 3, 1)
repeated
[[4]
 [5]
 [6]]
[[4]
 [5]
 [6]]
[[[4]
  [5]
  [6]]]


#### Advanced indexing in numpy  returns a copy instead of a view
 
when x[obj] obj is a non-tuple sequence object  `x[(1,2,3),]`   !! Basic: `x[(1,2,3)] = x[1,2,3]`  
an ndarray (dtype integer or bool)  `x[[1,2,3]]`        !! Basic: `x[[1,2,slice(None)]]`  
or a tuple with at least one sequence object or ndarray (integer or bool)  
<br>

To be short, it is basic indexing if obj contains only one tuple or looks like no tuple (x[(1,2,3)] is syntax sugar of x[1,2,3])

In [8]:
x = np.array([[[1],[2],[3]], [[4],[5],[6]]])
print('a basic indexing')
print(x[(0,1)])
print(x[0][1])
print(x[0,1,0])
print('an advanced indexing')
print(x[[0,1]])

print('another basic indexing')
print(x[[1,0,slice(None)]])
print('another anvanced indexing')
print(x[[(1,0),slice(None)]])

a basic indexing
[2]
[2]
2
an advanced indexing
[[[1]
  [2]
  [3]]

 [[4]
  [5]
  [6]]]
another basic indexing
[4]
another anvanced indexing
[[[4]
  [5]
  [6]]

 [[1]
  [2]
  [3]]]


numpy broadcasting(adjust the shape so that array of different shapes can be operated on) achieved by using ix_  
x[np.ix_(rows, columns)]


In [9]:
x = np.array([[ 0,  1,  2],
            [ 3,  4,  5],
            [ 6,  7,  8],
            [ 9, 10, 11]])
rows = np.array([[0, 0],
                 [3, 3]], dtype=np.intp)

columns = np.array([[0, 2],
                    [0, 2]], dtype=np.intp)
print(x[rows, columns])


[[ 0  2]
 [ 9 11]]


In [17]:
rows = np.array([0, 3], dtype=np.intp)
columns = np.array([0, 2], dtype=np.intp)

print('No adjustment in row and column shape')
print(x[rows, columns])

print('use np.newaxis to row and broadcasting will be applied to columns')# （compare operations such as rows[:, np.newaxis] + columns)）
print('rows: {}'.format(rows[:, np.newaxis]))
print(x[rows[:, np.newaxis], columns])


print('using ix_ function')
print(x[np.ix_(rows, columns)])


No adjustment in row and column shape
[ 0 11]
use np.newaxis to row and broadcasting will be applied to columns
rows: [[0]
 [3]]
[[ 0  2]
 [ 9 11]]
using ix_ function
[[ 0  2]
 [ 9 11]]



x.flat returns a 1-dimensional view, returns an iterator that will iterate over the entire array (in C-contiguous style with the last index varying the fastest)


### Python pandas dataframe
`loc  iloc  ix  `  
are indexing methods, which you should use when performing indexing in both rows and columns, like `list[0][0]`  
They support indexing of list, slices and label/integer

In [11]:
import pandas as pd


In [12]:
df = pd.DataFrame({'col1': ['a', 'b', 'c'],
                  'col2': [1, 2, 3]})

print('df_initial:\n {}\n'.format(df))
df.loc[1, 'col1'] = 'd'
print('df_after:\n {}\n'.format(df))

df_initial:
   col1  col2
0    a     1
1    b     2
2    c     3

df_after:
   col1  col2
0    a     1
1    d     2
2    c     3



In [13]:
# indexing using list and slices
print(df.loc[[0,1],'col1':'col2'])


  col1  col2
0    a     1
1    d     2


In [14]:
# indexing using tuples is wierd which it returns object?????
print(df.loc[0:1, ('col1')])
print()
print(print(df.loc[0:1, ('col1','col2')]))

0    a
1    d
Name: col1, dtype: object

  col1  col2
0    a     1
1    d     2
None


**Careful! Returning a view versus a copy**
When setting values in a pandas object, care must be taken to avoid what is called **chained indexing**.

In [15]:

df = pd.DataFrame({'col1': ['a', 'b', 'c'],
                  'col2': [1, 2, 3]})

print('df_initial:\n {}\n'.format(df))
df['col1'][0] = 'd'  # pandas cannot guarantee the behavior: whether the vaule will be successfully set

print('df_after:\n {}\n'.format(df))




df_initial:
   col1  col2
0    a     1
1    b     2
2    c     3

df_after:
   col1  col2
0    d     1
1    b     2
2    c     3



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


`df.loc[:,('one','second')]` passes a nested tuple of (slice(None),('one','second')) to a single call to __getitem__. This allows pandas to deal with this as a single entity.

In [16]:
# An example from documents
dfmi = pd.DataFrame([list('abcd'),
                       list('efgh'),
                        list('ijkl'),
                         list('mnop')],
                       columns=pd.MultiIndex.from_product([['one','two'],
                                                          ['first','second']]))
   

value=2

dfmi.loc[:,('one','second')] = value
# becomes
dfmi.loc.__setitem__((slice(None), ('one', 'second')), value)


dfmi['one']['second'] = value
# becomes
dfmi.__getitem__('one').__setitem__('second', value)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Outside of simple cases, it’s very hard to predict whether it will return a view or a copy (it depends on the memory layout of the array, about which pandas makes no guarantees), and therefore whether the __setitem__ will modify dfmi or a temporary object that gets thrown out immediately afterward. 


But dfmi.loc is guaranteed to be dfmi itself with modified indexing behavior, so dfmi.loc.__getitem__ / dfmi.loc.__setitem__ operate on dfmi directly. Of course, dfmi.loc.__getitem__(idx) may be a view or a copy of dfmi.
 