Implement data passing functions #19

EntilZha · 2015-03-23T23:30:52Z

So far the only way to ingest data into ScalaFunctional is to read through using python defined data structures. It would be helpful to be able to read directly from data formats such as json/sql/csv.

Target milestone for everything completed will be 0.4.0.

This issue will serve as a parent issue for implementing each specific function.

Child issues:
#34 ~~seq.open~~
#35 ~~seq.range~~
#36 ~~seq.csv~~
#37 ~~seq.jsonl~~
#29 ~~seq.json~~
#30 ~~to_json~~
#31 ~~to_csv~~
#32 ~~to_file~~
#33 ~~to_jsonl~~

The text was updated successfully, but these errors were encountered:

ChuyuHsu · 2015-09-14T11:35:17Z

Hi, @EntilZha ,
I really like this project.
Do you stop updating this project?
or you have some better alternative?

EntilZha · 2015-09-14T11:39:33Z

I haven't stopped using the package. I actually still use it quite a lot, so most of the improvements/additions are when I feel something is missing so I just add it. Its been working fairly well for me, although recently I have been thinking about adding something to help with open/close files since I seem to be doing this alot.

If you have ideas/suggestions for things that you think are important, definitely let me know. Project is definitely not dead, its just reached a place where it is actually working pretty well.

EntilZha · 2015-09-18T21:00:43Z

Here is what I am thinking of doing:

The general abstraction is having multiple input streams/entrypoints instead of only seq
I want to keep the import at only needing from functional import seq rather than requiring a separate import for each type of input stream or importing a stream module. That is, I want to avoid needing from functional import seq; from functional import streams; streams.json("").map....
This can be resolved by setting attributes on seq so that streams can be accessed via seq.json or seq.range
However, the streams will be implemented in a separate module, so that they are still importable separately or all together. This probably also means moving seq to that same module, which is a more logical place for it than where it is currently anyway. Need to decide if I want to preserve from functional.chain import seq, I am inclined not to though since its not the official way to import and technically package is still pre 1.0.
To start, I think a good list of stream sources are: csv, json, reading lines from a file, read entire file and break on delimiter, and things like range.

EntilZha · 2015-09-19T00:46:49Z

seq.open and equivalently streams.open has been implemented and tested in 207b42b

ChuyuHsu · 2015-09-19T04:08:12Z

@EntilZha, I have thought about it.
From the OOP perspective, Single responsibility principle is the reason why factor pattern exists.
Suppose you have implemented the data reading method in seq. Then you have to continuously modify and test the seq code, if you want to add new data source.
I think that is the reason why rdd is usually created by a factor SparkContext and pandas.DataFrame is created by pd.read_table, etc.
Even the scala object definition usually has been understood as a "Factory".

But I can understand the effort you trying to eliminating multiple importing.
If the list of stream source is short, that will be fine. However in long term, maintaining seq will be painful.

EntilZha · 2015-09-19T04:21:55Z

I think there is a slight confusion (if not, would be happy to be corrected). The definition of the seq method would stay the same. However since functions are objects in python, I can set attributes on them. So I am settings attributes to functions that implement these other stream operations. Effectively this creates a convenient alias. The short version looks like below, but the specific code implementing this is here: https://github.com/EntilZha/ScalaFunctional/blob/master/functional/streams.py

def seq(input):
    # Implementation of seq goes here
    pass

def open(input):
    # Implementation of open here
    pass

seq.open = open

#In code using functional
from functional import seq
seq.open('filename').....
seq(regular_input).....

I am currently working on implementations of the stream functions which do the necessary preprocessing then hand off the ordinary python sequence to seq to turn into a functional.pipelines.Sequence.

I don't know enough about pandas, but at least for Spark (I think) its mostly that SparkContext contains lots of information about the execution context which isn't as applicable here.

ChuyuHsu · 2015-09-19T05:42:40Z

Okay, got it.
That is beautiful.

p.s. by the spark part I previously mentioned, I actually meant
sc.textFile("/path/to/file") will create a RDD, instead of creating by RDD its own.

EntilZha · 2015-11-01T00:38:49Z

Now closing this since all its child issues have been implemented and closed. Since this is now resolved, getting very close to releasing 0.4.0 after working on the documentation a bit.

EntilZha added feature roadmap labels Mar 23, 2015

EntilZha self-assigned this Mar 23, 2015

EntilZha modified the milestones: 0.3.0, 0.4.0 Mar 23, 2015

EntilZha removed their assignment Apr 24, 2015

EntilZha self-assigned this Sep 26, 2015

EntilZha added the in progress label Sep 26, 2015

EntilZha closed this as completed Nov 1, 2015

EntilZha removed the in progress label Nov 1, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement data passing functions #19

Implement data passing functions #19

EntilZha commented Mar 23, 2015

ChuyuHsu commented Sep 14, 2015

EntilZha commented Sep 14, 2015

EntilZha commented Sep 18, 2015

EntilZha commented Sep 19, 2015

ChuyuHsu commented Sep 19, 2015

EntilZha commented Sep 19, 2015

ChuyuHsu commented Sep 19, 2015

EntilZha commented Nov 1, 2015

Implement data passing functions #19

Implement data passing functions #19

Comments

EntilZha commented Mar 23, 2015

ChuyuHsu commented Sep 14, 2015

EntilZha commented Sep 14, 2015

EntilZha commented Sep 18, 2015

EntilZha commented Sep 19, 2015

ChuyuHsu commented Sep 19, 2015

EntilZha commented Sep 19, 2015

ChuyuHsu commented Sep 19, 2015

EntilZha commented Nov 1, 2015