# Rows filtration by location in an area

Here is new interesting sample code that was used to process real business data.

Now then, we have a table with orders coordinates, latitude and longitude, and we want to check which ones are located in some given area (some region of a city). Let's first describe this area. In general, it looks this way:

![Area](data/ce2-1.png)

This irregular hexagonal shape may look a little bit complicated from the first sight, but it can be easily devided into 3 triangles as shown by white lines. Also, checking whether a point is inside a triangle is easy. Here I use own simple code to describe points, triangles, and to check if a point is inside triangle:

In [1]:
from collections import namedtuple
Point = namedtuple('Point', 'x y')

class Triangle:
    def __init__(self, p1, p2, p3):
        self.p1 = p1
        self.p2 = p2
        self.p3 = p3

    @staticmethod
    def calc_area(p1, p2, p3):
        return abs((p1.x - p3.x) * (p2.y - p1.y) - (p1.x - p2.x) * (p3.y - p1.y)) / 2.0

    def area(self):
        return self.calc_area(self.p1, self.p2, self.p3)

    def is_inside(self, pnt):
        # The idea is that if the point is inside a triangle,
        # it must split it into 3 smaller triangles which sum of areas
        # must be equal to the area of original triangle
        A = self.area()
        A1 = self.calc_area(pnt, self.p2, self.p3)
        A2 = self.calc_area(self.p1, pnt, self.p3)
        A3 = self.calc_area(self.p1, self.p2, pnt)
        # A1+A2+A3 must be equal to A, but we can allow computer some tiny error, 1%
        if A1 + A2 + A3 - A < A / 100:
            return True
        else: 
            return False

So, now we can describe the area using points and triangles:

In [2]:
A, B = Point(47.571537, 42.5274088), Point(47.5846214, 42.557278)
C, D, E = Point(47.5216788, 42.5436832), Point(47.5354716, 42.573346), Point(47.5505958, 42.6069556)
F = Point(47.490442, 42.6267124)
shape = [Triangle(C, E, F), Triangle(A, B, D), Triangle(B, D, E)]

And the function to check if a point is inside any triangle:

In [3]:
def is_inside_shape(pnt):
    return sum(map(lambda x: x.is_inside(pnt), shape)) > 0

Ok, let's process the data!

In [4]:
from lemuras import Table

orders = Table.from_csv('data/orders-pos.csv', False)
orders

order_id,client_id,total_sum,date_created,lat,lon
79938,459583,988.0,2018-01-02 05:09:30,47.661152400000006,62.2224172
79954,305848,1450.0,2018-01-02 07:11:45,47.619584999999994,62.1688624
79950,2943832,904.0,2018-01-02 07:14:01,49.859165000000004,47.653378
79957,668815,1622.8,2018-01-02 07:27:52,51.849238400000004,44.2043104
...,...,...,...,...,...


The filtration code:

In [5]:
def check_row(row):
    if row['lat'] is None:
        return None
    else:
        return is_inside_shape(Point(row['lat'], row['lon']))
# The `handle` function works similar to `apply`
# but allows to deal with entire rows
checker = orders.handle(check_row)

# Or, the functional style if you prefer:
checker = orders.handle(lambda row: 0 if row['lat'] is None else is_inside_shape(Point(row['lat'], row['lon'])))

checker

That's it! 1% of orders is located in the area.

In [6]:
filtered = orders.loc(checker)
filtered

order_id,client_id,total_sum,date_created,lat,lon
79977,419155,730.0,2018-01-02 12:41:45,47.5291702,42.59985880000001
79323,2906574,1102.0,2018-01-03 15:35:12,47.517124599999995,42.579290799999995
79410,419155,1036.0,2018-01-05 15:23:47,47.5291702,42.59985880000001
79840,2966116,478.0,2018-01-07 09:36:14,47.5317938,42.568736799999996
...,...,...,...,...,...
