Skip to content

Commit

Permalink
Fixed conflicts
Browse files Browse the repository at this point in the history
  • Loading branch information
marioskrlectildeloop committed Apr 29, 2024
2 parents 9c270fa + d6852c2 commit 9ef6491
Show file tree
Hide file tree
Showing 15 changed files with 351 additions and 120 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/go.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,5 @@ jobs:

- name: Test
run: go clean -testcache && go test -v -race ./...
- name: Multiple run
run: bash multiple_test.sh
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
.idea
testdata/statistics.csv
16 changes: 16 additions & 0 deletions LICENCE
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
Copyright 2024 Mario Škrlec

Permission is hereby granted, free of charge, to any person obtaining a copy of this
software and associated documentation files (the “Software”), to deal in the Software
without restriction, including without limitation the rights to use, copy, modify,
merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
191 changes: 174 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,186 @@
> [!CAUTION]
> This package is still in development
> This package is still a work in progress. You can try it out
> but the API might change in future versions but not drastically.
# Introduction
With **cig**, you can query a .csv file with sql syntax.

With **cig**, you can query a .csv file with sql syntax. It is still in development,
but as time progresses, you would be able to filter data in a csv file with SQL syntax.
For example
- [Installation](#installation)
- [Usage](#usage)
- [Why this exists](#why-this-exists)
- [Tasks until finished](#development-tasks-until-the-project-is-finished)

**Important considerations:**

1. Columns to return, columns in where conditions, columns in ORDER BY clause
and values must be enclosed in single quotes. For example:

````sql
SELECT * FROM path:my_data.csv AS e WHERE e.column = 'value'
SELECT 's.ColumnOne', 's.ColumnTwo'
FROM path:path_to_csv.csv AS s WHERE 's.ColumnThree' = 'value'
ORDER BY 's.columnFour', 's.ColumnFive' DESC
````
2. Alias is required. Without the `AS s` part of the above query, the query
would not be able to run.

3. Path to a file must be relative to the executing binary or an absolute path.
Consider always giving absolute path for better portability.

For now, you can test it only with the above example, or without the **where** clause what
will return all the rows. The return data type will be `map[string]string`
4. This project does not and will not implement the entire SQL syntax. Other than
tasks outlined in the [Tasks section](#development-tasks-until-the-project-is-finished),
nothing else will be developed except making it faster and maintainable.

5. This is not a project that should be used in production. Its only use is for simple
lookups and nothing else. In most situations, it is better to import a csv file into
a database of your choice. This project is intended as "something interesting to do" for
me so do not take it too seriously.

6. This package will be concurrency safe. This means that `Run()` method
will be able to be used inside your own concurrency primitives. Although
I will try to make it faster using concurrency for very large files,
that will not affect using the public API in your code.

# Installation

`go get github.com/MarioLegenda/cig`
`go get github.com/MarioLegenda/cig@v0.1.1`

# Usage

Below snippet of sql describes almost all current features of this package:

````sql
SELECT * FROM path:path_to_file.csv AS g WHERE 'g.columnOne' = 'string_value'
AND 'g.columnTwo'::int != '65' OR 'g.columnThree'::float = '56.3'
OFFSET 34
LIMIT 56
ORDER BY 'g.columnFour', 'g.columnFive' DESC
````

Instead of `*`, you can specify the columns to return like this:

````sql
SELECT 'g.columnOne', 'g.columnTwo' /** rest of query goes here */
````

If you don't specify `DESC` or `ASC`, `ASC` is assumed.

In code, you use it like this:

````go
package main

import (
"fmt"
"github.com/MarioLegenda/cig"
"log"
)

func main() {
c := cig.New()

result := c.Run(`
SELECT * FROM path:path_to_file.csv AS g WHERE 'g.columnOne' = 'string_value'
AND 'g.columnTwo'::int != '65' OR 'g.columnThree'::float = '56.3'
OFFSET 34
LIMIT 56
ORDER BY 'g.columnFour', 'g.columnFive' DESC
`)

if result.Error != nil {
log.Fatalln(result.Error)
}

fmt.Println(result.SelectedColumns)
fmt.Println(result.AllColumns)
fmt.Println(result.Data)
}
````

Signature of the result is

````go
type Data struct {
SelectedColumns []string
AllColumns []string
Error error
Data []map[string]string
}
````

You can handle errors with the `errors.Is` function if you need fine grained
control of exactly which error happened.

````go
package main

import (
"errors"
"fmt"
"github.com/MarioLegenda/cig"
cigError "github.com/MarioLegenda/cig/pkg"
"log"
)

func main() {
c := cig.New()

result := c.Run(`
SELECT * FROM path:path_to_file.csv AS g WHERE 'g.columnOne' = 'string_value'
AND 'g.columnTwo'::int != '65' OR 'g.columnThree'::float = '56.3'
OFFSET 34
LIMIT 56
ORDER BY 'g.columnFour', 'g.columnFive' DESC
`)

if errors.Is(result.Error, cigError.InvalidAlias) {
log.Fatalln(result.Error)
}

fmt.Println(result.SelectedColumns)
fmt.Println(result.AllColumns)
fmt.Println(result.Data)
}
````

This is the full list of errors you can use:

````go

var InvalidToken = errors.New("Expected WHERE or LIMIT, OFFSET, ORDER BY, got something else.")
var InvalidSelectToken = errors.New("Expected 'select', got something else.")
var InvalidSelectableColumns = errors.New("Expected selectable column")
var InvalidDuplicatedColumn = errors.New("Duplicated selectable column")
var InvalidFromToken = errors.New("Expected 'FROM', got something else.")
var InvalidFilePathToken = errors.New("Expected 'path:path_to_file' but did not get the path part")
var InvalidAsToken = errors.New("Expected 'as', got something else.")
var InvalidAlias = errors.New("Invalid alias.")
var InvalidColumnAlias = errors.New("Column alias not recognized.")
var InvalidWhereClause = errors.New("Expected WHERE clause, got something else.")
var InvalidConditionColumn = errors.New("Expected condition column.")
var InvalidComparisonOperator = errors.New("Invalid comparison operator")
var InvalidLogicalOperator = errors.New("Invalid logical operator")
var InvalidValueToken = errors.New("Invalid value token.")
var InvalidDataType = errors.New("Invalid data type.")
var InvalidConditionAlias = errors.New("Invalid condition alias.")
var InvalidOrderBy = errors.New("Invalid ORDER BY")

````

# Why this exists

One use could be in an environment where it is not possible to install a database
just to lookup some values in a .csv file. This package will provide a command line
utility to do so. Other than that, it would be better to import a .csv file into
a database of your choice and use it like that.

# Future development tasks (for now)
# Development tasks until the project is finished

- [ ] Implement logical operators
- [ ] Implement all comparison operators (now, only equality works)
- [ ] Implement picking columns to return
- [ ] Implement OFFSET and LIMIT to implement pagination
- [ ] Implement sorting
- [ ] Implement options (cache?, timeout?)
- [ ] Implement goroutine worker balancer (if needed)
- [x] Implement logical operators
- [x] Implement all comparison operators (now, only equality works)
- [x] Implement picking columns to return
- [x] Implement OFFSET and LIMIT to implement pagination
- [x] Implement sorting
- [ ] Create a command line utility to use it on the command line
- [ ] Implement JOIN with multiple files
- [ ] Implement options (cache, timeout with context, extremely simple optional indexing on first query execution)
- [ ] Implement splitting work into multiple goroutines
- [ ] Implement solutions from one billion rows challenge
9 changes: 6 additions & 3 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,19 @@ module github.com/MarioLegenda/cig

go 1.22.1

require (
github.com/jedib0t/go-pretty/v6 v6.5.8
github.com/spf13/cobra v1.8.0
github.com/stretchr/testify v1.9.0
)

require (
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/inconshreveable/mousetrap v1.1.0 // indirect
github.com/jedib0t/go-pretty/v6 v6.5.8 // indirect
github.com/mattn/go-runewidth v0.0.15 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/rivo/uniseg v0.2.0 // indirect
github.com/spf13/cobra v1.8.0 // indirect
github.com/spf13/pflag v1.0.5 // indirect
github.com/stretchr/testify v1.9.0 // indirect
golang.org/x/sys v0.17.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
)
1 change: 1 addition & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ github.com/stretchr/testify v1.9.0 h1:HtqpIVDClZ4nwg75+f6Lvsy/wHu+3BoSGCbBAcpTsT
github.com/stretchr/testify v1.9.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
golang.org/x/sys v0.17.0 h1:25cE3gD+tdBA7lp7QfhuV+rJiE9YXTcS3VG1SqssI/Y=
golang.org/x/sys v0.17.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
3 changes: 3 additions & 0 deletions internal/db/conditionResolver/resolveCondition.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ func ResolveCondition(condition syntaxStructure.Condition, metadata ColumnMetada
head := condition
var prevOp string

if head == nil {
return false, fmt.Errorf("Invalid condition head. This is internal error and a bug.")
}
// setup
for head != nil {
next := head.Next()
Expand Down
11 changes: 11 additions & 0 deletions internal/db/selectedColumnMetadata/columnMetadata.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ type columnMetadata struct {

type ColumnMetadata interface {
Column(pos int) string
Position(name string) int
Names() []string
HasPosition(pos int) bool
}
Expand All @@ -25,6 +26,16 @@ func (cm columnMetadata) Column(pos int) string {
return ""
}

func (cm columnMetadata) Position(name string) int {
for p, s := range cm.names {
if s == name {
return cm.positions[p]
}
}

return -1
}

func (cm columnMetadata) HasPosition(pos int) bool {
for _, s := range cm.positions {
if s == pos {
Expand Down
53 changes: 24 additions & 29 deletions internal/job/searchFn.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,6 @@ func SearchFactory(
if err != nil {
return nil, fmt.Errorf("Error in job %d while reading file. Trying to skip the first row but failed: %w", id, err)
}
limit := constraints.Limit()
offset := constraints.Offset()
orderBy := constraints.OrderBy()

var currentCollectedLimit int64
var currentCollectedOffset int64

collectionFinished := false

Expand All @@ -56,41 +50,46 @@ func SearchFactory(
break
}

if offset != nil && currentCollectedOffset < offset.Value() {
currentCollectedOffset++

continue
}

if limit != nil && currentCollectedLimit == limit.Value() {
collectionFinished = true
break
}

if condition != nil {
ok, err := conditionResolver.ResolveCondition(condition, metadata, lines)
if err != nil {
return nil, fmt.Errorf("Error in job %d while reading from the file: %w", id, err)
}

if ok {
if limit != nil {
currentCollectedLimit++
}
/* v, _ := strconv.ParseInt(lines[2], 10, 64)
if v < 2023 {
fmt.Println(v, ok)
}*/

if ok {
collectedLines = append(collectedLines, lines)
}
} else {
if limit != nil {
currentCollectedLimit++
}

collectedLines = append(collectedLines, lines)
}
}
}

limit := constraints.Limit()
offset := constraints.Offset()
orderBy := constraints.OrderBy()

if orderBy != nil {
sortResults(collectedLines, orderBy, metadata)
}

var currentCollectedOffset int64

for _, line := range collectedLines {
if offset != nil && offset.Value() != currentCollectedOffset {
currentCollectedOffset++
continue
}

if limit != nil && int64(len(results)) == limit.Value() {
break
}

res, err := createResult(line, selectedColumns)
if err != nil {
return nil, fmt.Errorf("Error in job %d while reading from the file: %w", id, err)
Expand All @@ -99,10 +98,6 @@ func SearchFactory(
results = append(results, res)
}

if orderBy != nil {
return sortResults(results, orderBy), nil
}

return results, nil
}
}
Expand Down
Loading

0 comments on commit 9ef6491

Please sign in to comment.