Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FakeStream implementation #435

Merged
merged 6 commits into from
Oct 17, 2022
Merged

Conversation

NarekDW
Copy link
Contributor

@NarekDW NarekDW commented Oct 14, 2022

This is the FakeStream simple implementation.
But I have a few doubts here:

  1. Is it worth to move FakeStream and FakeCollection classes to separate folder (like util) ?
  2. For now it doesn't support operations like
    Format.toCsv(FakeStream.Builder...) and Format.toJson(FakeStream.Builder...)
    as it (Csv and Json classes in particular) was designed for FakeCollection only and it will take much effort (and much redundant and similar code) to add such an ability. What do you think, how could we improve the code and avoid of writing a lot of similar code? Probably to add some new level of abstraction? Or adding some converter function, which could convert FakeStream to FakeCollection and the pass collection to the functions like Format.toCsv(FakeStream.Builder...) and Format.toJson(FakeStream.Builder...) ?
  3. Also, do we need a common interface or abstract class for FakeCollection and FakeStream, as them contain quite similar API? From my point of view, here we don't need common interface, as in fact them belong to different subjects...

@codecov-commenter
Copy link

codecov-commenter commented Oct 14, 2022

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 86.44068% with 16 lines in your changes missing coverage. Please review.

Project coverage is 93.04%. Comparing base (5f7318f) to head (c5e9e51).
Report is 1140 commits behind head on main.

Files with missing lines Patch % Lines
...in/java/net/datafaker/sequence/FakeCollection.java 81.81% 3 Missing and 1 partial ⚠️
...main/java/net/datafaker/sequence/FakeSequence.java 90.24% 3 Missing and 1 partial ⚠️
...c/main/java/net/datafaker/sequence/FakeStream.java 80.00% 3 Missing and 1 partial ⚠️
src/main/java/net/datafaker/formats/Json.java 85.71% 1 Missing and 1 partial ⚠️
src/main/java/net/datafaker/formats/Csv.java 93.75% 0 Missing and 1 partial ⚠️
...n/java/net/datafaker/providers/base/BaseFaker.java 66.66% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #435      +/-   ##
============================================
- Coverage     93.07%   93.04%   -0.04%     
- Complexity     2067     2084      +17     
============================================
  Files           229      231       +2     
  Lines          4263     4315      +52     
  Branches        432      441       +9     
============================================
+ Hits           3968     4015      +47     
- Misses          183      185       +2     
- Partials        112      115       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bodiam
Copy link
Contributor

bodiam commented Oct 14, 2022

@snuyanzin can you help in answering some of the above? I'm not sure far you got with your template refactoring.

@snuyanzin
Copy link
Collaborator

Hi @NarekDW great to see here and thanks for the contribution.
Agree about common classes.
First I would say we could completely reuse Builder and would say replace build() methods with collection() to build FakeCollection and stream() to build FakeStream... In theory for backward compatibility build() method could be still kept and marked as deprecated.
What do you think?

I will try to have a deeper look later

@NarekDW
Copy link
Contributor Author

NarekDW commented Oct 14, 2022

Hi @snuyanzin glad to see you 😊
Thank you for the response :)
As I understood, you want to combine FakeCollection and FakeStream into one class and reuse common Builder, right?
If so, then I see some issues here:

  1. Collection and Stream have different nature, Stream is not Collection and vice versa.
  2. At first glance I see a number of challenges in implementation, for example:
    2.1 initialization issue.
    FakeCollection - is always a fixed sized data structure (with default value = 10).
    FakeStream - could be an infinite size data structure.
    2.2 Issue to distinguish which data structure was created by the client (Collection or Stream)...

Correct me if I’m wrong please.
Don't you think that it would be better to keep them (FakeCollection and FakeStream) separately?

@snuyanzin
Copy link
Collaborator

snuyanzin commented Oct 15, 2022

Not really. FakeCollection and FakeStream could be in separate classes
At the same time as you suggested above there could be an interface e.g. Sequence (probably there could be better name).
Both FakeCollection and FakeStream could implement it.
At the same time both builders are the same, so it could be extracted to this Sequence interface or even separately.

Initialization issue could be easily workarounded e.g.

    public static class Builder<T> {
        private final List<Supplier<T>> suppliers;
        private int minLength = -1; // negative means same as maxLength
        private int maxLength = -1; 
        private double nullRate = 0d;
        private BaseProviders faker;
    ...

    public FakeStream<T> buildStream() {
      ...
    }

    public FakeCollection<T> buildCollection() {
        maxLength = maxLength == -1 ? 10 : maxLength;
        ...
     }
}

2.2 Issue to distinguish which data structure was created by the client (Collection or Stream)...

Didn't get this problem. In Faker there could be 2 different methods collection() and stream(). So depends on which was invoked

@NarekDW
Copy link
Contributor Author

NarekDW commented Oct 16, 2022

@snuyanzin Ok, got it. Let me try to apply that interface.
As for me, Sequence - is a good name for common interface.

…hem to util folder + add CSV and Json creation support from FakeSequence
@NarekDW NarekDW force-pushed the feature/fake-stream branch from 697545a to bc796dc Compare October 16, 2022 21:01
# Conflicts:
#	src/main/java/net/datafaker/providers/base/BaseFaker.java
#	src/test/java/net/datafaker/fileformats/CsvTest.java
#	src/test/java/net/datafaker/util/FakeCollectionTest.java
@NarekDW
Copy link
Contributor Author

NarekDW commented Oct 16, 2022

@snuyanzin

  1. I've added an abstract class FakeSequence, which is common class for FakeCollection and FakeStream classes, with common methods/API/Builder (in the child classes we just overwrite the build method).
  2. Also, I changed CSV and Json creation method to support that common (FakeSequence) class, instead of using FakeCollection only. Added support to generate csv and json from finite/infinite stream.
  3. I moved that classes to the separate folder - util. Is it fine from backward compatibility point of view?
  4. I don't like some classes names like CsvCollectionBasedBuilder and JsonFromCollectionBuilder + JsonForFakeCollection. It could be better to name them *Sequence*, but I didn't change them as they are public classes and the change could lead to some issues with client code (regarding backward compatibility)...

Could you review please?
Is this the same way which you mentioned in your previous comment?

@snuyanzin
Copy link
Collaborator

snuyanzin commented Oct 17, 2022

thanks, in general it looks good
there are a few cosmetic comments
1.

I moved that classes to the separate folder - util. Is it fine from backward compatibility point of view?

It's better to have some more self-explanatory names e.g. sequence rather than util
2. Currently there are 2 classes CsvTest within this PR, the one from fileformats package should be removed together with fileformats package

@NarekDW
Copy link
Contributor Author

NarekDW commented Oct 17, 2022

Done. Changed the folder name from util to sequence and changed the error message in case of generating csv or json file from infinite sequence.

public abstract FakeSequence<T> build();

/**
* @param <S> [Stream<T>, List<T>]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is a reason of ci failure, also fails locally with mvn clean install
@NarekDW fyi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, my bad, fixed

@snuyanzin snuyanzin merged commit 7c580d3 into datafaker-net:main Oct 17, 2022
@snuyanzin
Copy link
Collaborator

Merged
thanks for the contribution @NarekDW

@bodiam
Copy link
Contributor

bodiam commented Oct 17, 2022

@NarekDW This is a really great contribution! Love it! Just wondering: would it be possible for you to add something to our documentation, the markdown files I mean? If you don't have time, no problem, I'm happy to port some of your unit tests to the public website (I saw you added something to the README already) and publish it here or so:

https://www.datafaker.net/documentation/collections/

@NarekDW
Copy link
Contributor Author

NarekDW commented Oct 17, 2022

@bodiam Yep, I can do that, but I'm not able to edit the page, which you shared (when I click on the edit button I get 404 not found).
Do you have any guide how to edit the documentation?

@snuyanzin
Copy link
Collaborator

this page is a result of generation based on input here https://github.com/datafaker-net/datafaker/blob/main/docs/documentation/collections.md
so you can just edit this file

@NarekDW
Copy link
Contributor Author

NarekDW commented Oct 17, 2022

Thanks, let me take a look.

@bodiam
Copy link
Contributor

bodiam commented Oct 17, 2022

Thanks!

And sorry about that 404, I've probably configured something wrong, thanks for reporting!

@NarekDW
Copy link
Contributor Author

NarekDW commented Oct 17, 2022

Please review - #443
Is that autogeneration (of docs) a part of CI?
How can I check the result? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants