Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing bid ask dataframes into csv files #2

Open
BGTCapital opened this issue Dec 11, 2021 · 2 comments
Open

Storing bid ask dataframes into csv files #2

BGTCapital opened this issue Dec 11, 2021 · 2 comments
Labels
question Further information is requested

Comments

@BGTCapital
Copy link

Hi JoshLove
Thank you very much for the great work that you have done with the Dashboard. I am new to python but I am trying to contribute with a feature to store all bid ask data into csv files for future use.
I am trying to save the dataframes into a csv file with this format :
sample_ (1).csv
How would you suggest to do that in real time, so that no datapoint is lost ?
Thank you again for this great work and your support.
Best
BGT

@jlove-dev
Copy link
Owner

jlove-dev commented Dec 11, 2021

Hi @BGTCapital - let's see if I can answer this. I can answer this is two ways:

  1. Do you need the dashboard along with this? If you don't (you're just capturing data) - I'd recommend either using my other repo (https://github.com/JoshLove-portfolio/Coinbase_L2_Socket_Lite) or cryptofeed (https://github.com/bmoscon/cryptofeed). Cryptofeed is how I got started on this project, it has more features but my version is lighter on resources. Coinbase_L2_Socket_Lite will also be a lot easier for you to work with since it's rather minimal code. You should be able to hop right in and insert the logic you need. If you decide that you want to use the Coinbase_L2_Socket_Lite, open this same issue over there and I'll close this one.

Looking at the csv part, if you want to write to a CSV - it can be done with pandas! Documentation on this can be found here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html & here: https://datascienceparichay.com/article/pandas-append-dataframe-to-existing-csv/

Pandas is used in both of my repos so you should be able to take the dataframe, once it's built, and then immediately append it to the csv. You'll want to use the append method for your use case which is shown in the second link. Bear in mind, this might slow down performance if computer resources are an issue.

  1. Let's say you need the dashboard - it's basically the same as above. Looking at the code, I'd recommend doing it in this function from the cryptofeed_worker.py:
# Example taken from https://github.com/bmoscon/cryptofeed/blob/master/cryptofeed/backends/_util.py
    # Used to flatten the order book to construct dataframe for usage in Dash
    def flatten_book(self):
        new_list = []
        for side in (BID, ASK):
            for price, data in self.book[side].items():
                # This format allows for easy transference into a Pandas dataframe
                # This was tested with 5000 entries and no noticeable performance issues were present
                new_list.append({'side': side, self.symbol_string: price, 'size': data})

        new_df = pandas.DataFrame(new_list)
        self.bids = new_df.loc[new_df['side'] == 'bid']
        self.asks = new_df.loc[new_df['side'] == 'ask']
        self.mid_market = (float(self.asks.iloc[0][self.symbol_string]) + float(
            self.bids.iloc[-1][self.symbol_string])) / 2

This function is called anytime an update happens. Here, you can see me creating the new_df which is what you can write.

  1. Note I looked at your CSV and I'm not sure how you'll get the cumulative volume. In my case, this is done by the Dash Plotly webserver. You can see it here in this line from the webserver.py:
html.Div([
        dcc.Graph(id='live-update-graph',
                  figure=px.ecdf(master.get_books("eth").get_asks(), x='ETH-USD Price', y="size",
                                 ecdfnorm=None,
                                 color="side",
                                 labels={
                                     "size": "ETH",
                                     "side": "Side",
                                     "value": "ETH-USD Price"
                                 },

Ecdf is the depth chart which you've seen on the dashboard. It sums the "size" of each entry to a total for that level. More info here: https://plotly.com/python/ecdf-plots/

Thus, if you want to capture the volume - you'll need to add some of your own logic to do so.

Let me know if this is helpful.

@jlove-dev jlove-dev added the question Further information is requested label Dec 11, 2021
@BGTCapital
Copy link
Author

Hi
Thank you very much for your detailed answer. Actually what I am trying to do is to eventually add a feature to your dashboard (fantastic work BTW), so that data from cryptofeed_worker.py (or Cryptofeed) is published to Google Pub/Sub.

Then I would like to add a feature for the webserver to subscribe to the service and draw the graph; while simultaneously I create a logger, that basically subscribes to the same Google Pub/Sub service and persist the panda dataframe in realtime into google cloud Storage using parquet binary format.

Would you publish to Google Pub/Sub from cryptofeed_worker.py or Cryptofeed ? Do you have any experience with pandas and parquet?
Thanks in advance for your support.
Best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants