Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement basic http/https service. #9

Closed
Adron opened this Issue Dec 28, 2016 · 6 comments

Comments

Projects
None yet
2 participants
@Adron
Copy link
Owner

Adron commented Dec 28, 2016

No description provided.

@Adron Adron self-assigned this Dec 28, 2016

@Adron Adron added ready in progress and removed ready labels Jan 2, 2017

@ninjarobot

This comment has been minimized.

Copy link
Contributor

ninjarobot commented Jan 30, 2017

@Adron have you given any thought to the service structure and API interactions? I'm thinking of building some of this out, and trying to determine what is the most useful. If the consumer is someone making curl calls and feeding those into application input, then returning JSON would be problematic. Or is the consumer likely to always be a client library?

Also, do you think endpoints like /random/city/en, /random/address/en, /random/sentence/en would be reasonable for getting various types of data? The more structured the response, it seems like more parameters will be needed to get sensible data. I.e. I don't generally want a totally random sentence - I want a sentence that includes an address, a job, a brand, and a weekday or something like that. /random/sentence/en?include=job,brand,address,weekday and then the service will include those in the correct order with some filler words between them so you get something like "Airline pilot at Sears at 9738 Seashore Dr. for fun on Wednesdays."

Dunno...just trying to determine how to make an HTTP API that will be consumable and useful.

@Adron

This comment has been minimized.

Copy link
Owner Author

Adron commented Jan 31, 2017

Actually, I want to write up exactly what I was imagining with this. The idea I was thinking of was having an end point like /build/en or /build/ru and basically one passes in a JSON (or in the future other prospective formats too) that would inform the API on what data to generate (AKA build). That JSON would have various formats (which examples are available here, where I was creating some default JSON files with these formats). I'm currently writing - as quickly as I can get it done - a blog entry on what I was thinking and will publish on http://blog.adron.me ASAP. I wanted to last weekend but got a bit distracted. This weekend however I will 100% be working diligently on this effort. I'm also hoping to put together a few things for it during the week, but no promises just yet.

I'm going to also add in your ideas above. Being that my idea to create a /build/en or /build/ru is cool and all, but I don't think it should be the only way to put this data together via the API. One other thing I want to add to the ideas you mentioned and to what I just mentioned, is a way to add either a JSON file or some type of file that states the location in which to send the data. For instance, if the data should be put into a database or some system of that sort, or made available for download via file, I want to be able to set that in the JSON or in another file called "connections" or something.

Kind of like:
export-connection-1.json

{
  "source": "postgres",
  "username": "the_user_name",
  "password": "the_secret_password",
  "etc": "other connection requirements"
}

...or maybe add it to the existing JSON data schema request as a JSON array of connections like...

{
  ... other settings up here for the schema request ...
  
  "export_connection": {
    [...array of connections...]
  }
}

The later would need to have the schema requested mapped logically somehow to the source the data should be mapped and inserted into. I'm also assuming the connections would need specific drivers built to handle the data being put into the various sources. For instance a RDBMS SQL insert statement would need to be built specifically and differently than a NoSQL insert into Riak, Cassandra, Neo4j, or whatever the source would be. Thus the schema would definitely need to be matched to specific types of sources it could or couldn't be inserted into.

Anyway, that's some of the ideas, but I'll add more via blog entry real soon and will follow up with a link to it on this thread.

Cheers! - also, thanks for adding the F#, that's bad ass! :)

@Adron Adron changed the title Implement basic http service. Implement basic http/https service. Feb 4, 2017

@Adron

This comment has been minimized.

Copy link
Owner Author

Adron commented Feb 5, 2017

Alright, here's a more updated description of things I've posted on my Composite Code blog titled "Data Diluvium Design Ideas". I'm going to also start breaking out the various things to build from the idea posted in that blog entry in the issues here on Github, but it'll likely be a week before I get around to that. In the meantime I hope others decide to jump into this so we could get a feel for what people would like to have in a service that generates data. 🔢 👍

@ninjarobot

This comment has been minimized.

Copy link
Contributor

ninjarobot commented Feb 6, 2017

@Adron this helps a ton! I was thinking the consumer would be piecing data together from the sources, but really they will send in information about the target environment (i.e. pgsql schema) and expect to get a response of sample data in the format appropriate for that target environment (i.e. pgsql INSERT statements). Having a separate URL for downloading the data adds a little more complexity (no longer stateless), so I'm going to try to build on this idea in a single request/response just for simplicity for right now. Let me give it a shot in a separate PR and comment on that.

@ninjarobot

This comment has been minimized.

Copy link
Contributor

ninjarobot commented Feb 6, 2017

@Adron I would probably expect the tables already exist, or at least that's out of the scope of this microservice. It seems like an awkward workflow for a developer to generate their tables based on their test data, and they'll probably have other artifacts like indexes that are defined based on business rules or performance considerations that aren't represented here. We also will probably need to add some means of defining a relationship between tables, because generated data in one table will need to be referenced in another.

Another thing I started considering during implementation is that the request should indicate the number of rows to generate, rather than just generating a single row. I'm thinking that in a postgres data generation scenario, we would return a prepared statement and then a series of inserts that use that prepared statement, keeping it safe from SQL injection.

PREPARE userplan (UUID, TEXT, TEXT, TEXT) AS
    INSERT INTO users VALUES($1, $2, $3, $4);

then iterate up to the rows requested generating EXECUTE statements, where the parameters are built from fake data:

EXECUTE userplan (generatedUuid, fakeFirstName, fakeLastName, fakeEmailAddress);

so the resulting script is something like this (with fake data instead in each EXECUTE):

PREPARE userplan (UUID, TEXT, TEXT, TEXT) AS
    INSERT INTO users VALUES($1, $2, $3, $4);
EXECUTE userplan (generatedUuid, fakeFirstName, fakeLastName, fakeEmailAddress);
EXECUTE userplan (generatedUuid, fakeFirstName, fakeLastName, fakeEmailAddress);
EXECUTE userplan (generatedUuid, fakeFirstName, fakeLastName, fakeEmailAddress);
EXECUTE userplan (generatedUuid, fakeFirstName, fakeLastName, fakeEmailAddress);
... up to the number of rows requested.
@Adron

This comment has been minimized.

Copy link
Owner Author

Adron commented Feb 7, 2017

@ninjarobot first comment - exactly. :)

Now that I've thought about it myself some more, having the tables exist would be logical. That way the service isn't really tying itself tightly to the underlying database, merely providing a way to insert data into an existing structure.

Rows - yeah, having the number of rows to generate would definitely need added.

I like the idea of prepending, etc, preventing any SQL Injection type issues that we can would be good. 👍

@Adron Adron closed this May 5, 2018

@Adron Adron removed the in progress label May 5, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.