Right now, I've moved some of the constructors to be their own types. I think our weights will be clearer & more consistent if:

- W is the most generic type of weights object, and reflects arbitrarily-constructed weights from at least one `neighbor` dictionary.
- Rook, Queen, DistanceBand & Kernel, etc, reflect basic types of either `neighbor` construction or `weights` construction. 
- We can make flyweight classes (like AdaptiveKernel) that just call their parent (Kernel) with special options, and that's fine. 

In [1]:
import pysal as ps
import numpy as np

I've been doing this cleanup & new work in parallel to `weights`, over in `weights2`. 

In [2]:
from pysal.weights2 import contiguity as cont, distance as dist
from pysal.weights2.weights import W as W2

In [3]:
path = ps.examples.get_path('south.shp')
psframe = ps.pdio.read_files(path)
filehandler = ps.open(path)

In [4]:
try:
    import shapely.geometry as geom
    sh_iterable = [geom.asShape(sh) for sh in psframe.geometry]
except ImportError:
    from warnings import warn as Warn
    Warn('No shapely :(')

### Contiguity Methods

I'll be showing Rook, but Queen has this, and I'm working on them for Distance methods. 

In [5]:
R1 = cont.Rook.from_iterable(sh_iterable)
R2 = cont.Rook.from_dataframe(psframe)
R3 = cont.Rook.from_shapefile(path)
Rreference = ps.rook_from_shapefile(path)

In [6]:
R1.neighbors == R2.neighbors == R3.neighbors == Rreference.neighbors

True

In [7]:
Q1 = cont.Queen.from_iterable(sh_iterable)
Q2 = cont.Queen.from_dataframe(psframe)
Q3 = cont.Queen.from_shapefile(path)
Qreference = ps.queen_from_shapefile(path)

In [8]:
Q1.neighbors == Q2.neighbors == Q3.neighbors == Qreference.neighbors

True

### Distance Methods

For these, the `from_iterable` equivalent is `from_array`. So, it's expected that users can get from shapes to a representative point array, if appropriate. I believe just extracting the centroid works for shapely shapes as well.

In [9]:
from pysal.weights2.util import get_points_array

In [10]:
point_array = psframe.geometry.apply(lambda x : np.array(x.centroid)).values
point_array = np.vstack(point_array)

In [11]:
point_array2 = [np.array(x.centroid) for x in sh_iterable]
point_array2 = np.vstack(point_array2)

In [12]:
point_array3 = get_points_array(psframe.geometry)
point_array4 = get_points_array(sh_iterable)
point_array5 = get_points_array(filehandler)

In [13]:
filehandler.seek(0)

In [14]:
np.testing.assert_allclose(point_array, point_array2)
np.testing.assert_allclose(point_array2, point_array3)
np.testing.assert_allclose(point_array3, point_array4)
np.testing.assert_allclose(point_array4, point_array5)

In [15]:
DB1 = dist.DistanceBand.from_dataframe(psframe, threshold=5)
DB2 = dist.DistanceBand.from_array(point_array, threshold=5)
DB3 = dist.DistanceBand.from_shapefile(path, threshold=5)

In [16]:
DBref = ps.DistanceBand(point_array, 5)

In [17]:
for a,b,c,d in zip(DB1, DB2, DB3, DBref):
    assert a == b
    assert b == c
    assert c == d

In [18]:
KW1 = dist.Kernel.from_dataframe(psframe, k=4, function='gaussian')
KW2 = dist.Kernel.from_array(point_array, k=4, function='gaussian')
KW3 = dist.Kernel.from_shapefile(path, k=4, function='gaussian')
KWref = ps.kernelW_from_shapefile(path, k=4, function='gaussian')

In [19]:
for a,b,c,d in zip(KW1, KW2, KW3, KWref):
    assert a == b
    assert b == c
    assert c == d

In [20]:
KNN1 = dist.KNN.from_dataframe(psframe, k=4)
KNN2 = dist.KNN.from_array(point_array, k=4)
KNN3 = dist.KNN.from_shapefile(path, k=4)
KNNref = ps.knnW_from_shapefile(path, k=4)

In [21]:
for a,b,c,d in zip(KNN1, KNN2, KNN3, KNNref):
    assert a == b
    assert b == c
    assert c == d

To resolve some of the concerns about multiple KNN queries on an array being inefficient, I did something similar to what I did in `MapClassify`:

In [22]:
KNN_k9 = KNN1.reweight(k=9, inplace=False) #makes a new W with k=9, but does not recompute the kdtree
KNN1.reweight(k=2, inplace=True) #modifies the KNNW in place, changing the number of k from 4 to 2

In [23]:
KNN1.histogram

[(2, 1412)]

In [24]:
KNN_k9.histogram

[(9, 1412)]

In [25]:
KNN_k9.kdtree is KNN1.kdtree

True

😊

### Confusion about ordering

What I'm still a little confused about is the use of `id_order`. For example, if we build weights from a dataframe in memory, we'd probably want the iteration order between the two to be the same. 

In [26]:
Q2o = cont.Queen.from_dataframe(psframe, idVariable='FIPS')

In [27]:
Q2o.id_order[0:5]

[u'01001', u'01003', u'01005', u'01007', u'01009']

In [28]:
Q2o.id_order_set

False

But, using an id column when constructing weights from file removes the sorting on id_order:

In [29]:
Q2o_ref = ps.queen_from_shapefile(path, idVariable='FIPS')

In [30]:
Q2o_ref.id_order[0:5]

[u'54029', u'54009', u'54069', u'54051', u'10003']

Note that still

In [31]:
Q2o_ref.id_order_set 

False

To me, this is confusing, since the docstring would lead you to believe that either:
- `id_order` is unspecified, defaults to something like `lexsort(ids)`, and `id_order_set` is `False`
- `id_order` is specified arbitrarily, and `id_order_set` is `True`. 

As I see it, `queen_from_shapefile` sets the id order using `remap_ids` (so default lexicographic ordering is gone), but does not set `id_order_set=True`. This is only problematic because I want the `from_dataframe` and `from_shapefile` to return the same values when we don't reindex the dataframe.

To replicate the `from_shapefile` order, `from_dataframe` must be passed something like:

In [32]:
Q2osort = cont.Queen.from_dataframe(psframe, idVariable='FIPS', id_order=True)

In [33]:
Q2osort.id_order[0:5]

[u'54029', u'54009', u'54069', u'54051', u'10003']

Which, again, looks like a request to use the column `idVariable` and treat its order as the order the weights should be in. This is currently what `idVariable` does in `from_shapefile`, but doesn't set `id_order_set`.

In [34]:
Q2osort.id_order_set

True

In [35]:
Q2o_ref.id_order_set

False